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PREFACE 



Multisensor data fusion is an emerging technology applied to Department of Defense (DoD) areas such as 
automated target recognition (ATR), identification-friend-foe-neutral (IFFN) recognition systems, battle- 
field surveillance, and guidance and control of autonomous vehicles. Non-DoD applications include mon- 
itoring of complex machinery, environmental surveillance and monitoring systems, medical diagnosis, and 
smart buildings. Techniques for data fusion are drawn from a wide variety of disciplines, including signal 
processing, pattern recognition, statistical estimation, artificial intelligence, and control theory. The rapid 
evolution of computers, proliferation of micro-mechanical/electrical systems (MEMS) sensors, and the 
maturation of data fusion technology provide a basis for utilization of data fusion in everyday applications. 

This book is intended to be a comprehensive resource for data fusion system designers and researchers, 
providing information on terminology, models, algorithms, systems engineering issues, and examples of 
applications. The book is divided into four main parts. Part I introduces data fusion terminology and 
models. Chapter 1 provides a general introduction to data fusion and terminology. Chapter 2 introduces 
the Joint Directors of Laboratories (JDL) data fusion process model, widely used to assist in understanding 
DoD applications. In Chapter 3, Jeffrey Uhlmann discusses the problem of multitarget, multisensor 
tracking and introduces the challenges of data association and correlation. Chapter 4, by Ed Waltz, 
introduces concepts of image and spatial data fusion, and in Chapter 5 Richard Brooks and Lynne Grewe 
describe issues of data registration for image fusion. Chapter 6, written by Richard Antony, discusses 
issues of data fusion focused on situation assessment and database management. Finally, in Chapter 7, 
Joseph Carl contrasts some approaches to combining evidence using probability and fuzzy set theory. 

A perennial problem in multisensor fusion involves combining data from multiple sensors to track 
moving targets. Gauss originally addressed this problem for estimating the orbits of asteroids by devel- 
oping the method of least squares. In its most general form, this problem is not tractable. In general, we 
do not know a priori how many targets exist or how to assign observations to potential targets. Hence, 
we must simultaneously estimate the state (e.g., position and velocity) of N targets based on M sensor 
reports and also determine which of the M reports belong to (or should be assigned to) each of the N 
targets. This problem may be complicated by closely spaced, maneuvering targets with potential obser- 
vational clutter and false alarms. 

Part II of this book presents alternative views of this multisensor, multitarget tracking problem. In 
Chapter 8, T. Kirubarajan and Yaakov Bar- Shalom present an overview of their approach for probabilistic 
data association (PDA) and the joint PDA (JPDA) methods. These have been useful in dense target 
tracking environments. In Chapter 9, Jeffrey Uhlmann describes another approach using an approximate 
method for addressing the data association combination problem. A classical Bayesian approach to target 
tracking and identification is described by Lawrence D. Stone in Chapter 10. This has been applied to 
problems in target identification and tracking for undersea vehicles. Recent research by Aubrey B. Poore, 
Suihua Lu, and Brian J. Suchomel is summarized in Chapter 11. Poore’s approach combines the problem 
of estimation and data association by generalizing the optimization problem, followed by development 
of efficient computational methods. In Chapter 12, Simon Julier and Jeffrey K. Uhlmann discuss issues 
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related to the estimation of target error and how to treat the codependence between sensors. They extend 
this work to nonlinear systems in Chapter 13. Finally, in Chapter 14, Ronald Mahler provides a very 
extensive discussion of multitarget, multisensor tracking using an approach based on random set theory. 

Part III of this book addresses issues of the design and development of data fusion systems. It begins 
with Chapter 15 by Ed Waltz and David L. Hall, and describes a systemic approach for deriving data 
fusion system requirements. Chapter 16 by Christopher Bowman and Alan Steinberg provides a general 
discussion of the systems engineering process for data fusion systems including the selection of appro- 
priate architectures. In Chapter 17, David L. Hall, James Llinas, Christopher L. Bowman, Lori McConnel, 
and Paul Applegate provide engineering guidelines for the selection of data fusion algorithms. In Chapter 
18, Richard Antony presents a discussion of database management support, with applications to tactical 
data fusion. New concepts for designing human-computer interfaces (HCI) for data fusion systems are 
summarized in Chapter 19 by Mary Jane Hall, Sonya Hall, and Timothy Tate. Performance assessment 
issues are described by James Llinas in Chapter 20. Finally, in Chapter 21, David L. Hall and Alan N. 
Steinberg present the dirty secrets of data fusion. The experience of implementing data fusion systems 
described in this section was primarily gained on DoD applications; however, the lessons learned should 
be of value to system designers for any application. 

Part IV of this book provides a taste of the breadth of applications to which data fusion technology 
can be applied. Mary L. Nichols, in Chapter 22, presents a limited survey of some DoD fusion systems. 
In Chapter 23, Carl S. Byington and Amulya K. Garga describe the use of data fusion to improve the 
ability to monitor complex mechanical systems. Robert J. Hansen, Daniel Cooke, Kenneth Ford, and 
Steven Zornetzer provide an overview of data fusion applications at the National Aeronautics and Space 
Administration (NASA) in Chapter 24. In Chapter 25, Richard R. Brooks describes an application of 
data fusion funded by DARPA. Finally, in Chapter 26, Hans Keithley describes how to determine the 
utility of data fusion for C4ISR. This fourth part of the book is not by any means intended to be a 
comprehensive survey of data fusion applications. Instead, it is included to provide the reader with a 
sense of different types of applications. Finally, Part V of this book provides a list of Internet Web sites 
and news groups related to multisensor data fusion. 

The editors hope that this handbook will be a valuable addition to the bookshelves of data fusion 
researchers and system designers. We remind the reader that data fusion remains an evolving discipline. 
Even for classic problems, such as multisensor, multitarget tracking, competing approaches exist. The book 
has sought to identify and provide a representation of the leading methods in data fusion. The reader 
should be advised, however, that there are disagreements in the data fusion community (especially by 
some of the contributors to this book) concerning which method is best. It is interesting to read the 
descriptions that the authors in this book present concerning the relationship between their own techniques 
and those of the other authors. Many of this book’s contributors have written recent texts that advocate 
a particular method. These authors have condensed or summarized that information as a chapter here. 

We take the view that each competing method must be considered in the context of a specific 
application. We believe that there is no such thing as a generic data fusion system. Instead, there are 
numerous applications to which data fusion techniques can be applied. In our view, there is no such 
thing as a magic approach or technique. Even very sophisticated algorithms may be corrupted by a lack 
of a priori information or incorrect information concerning sensor performance. Thus, we advise the 
reader to become a knowledgeable and demanding consumer of fusion algorithms. 

We hope that this text will become a companion to other texts on data fusion methods and techniques, 
and that it assists the data fusion community in its continuing maturation process. 
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Integration or fusion of data from multiple sensors improves the accuracy of applications ranging from 
target tracking and battlefield surveillance to nondefense applications such as industrial process moni- 
toring and medical diagnosis. 

1.1 Introduction 



In recent years, significant attention has focused on multisensor data fusion for both military and 
nonmilitary applications. Data fusion techniques combine data from multiple sensors and related infor- 
mation to achieve more specific inferences than could be achieved by using a single, independent sensor. 

The concept of multi sensor data fusion is hardly new. As humans and animals have evolved, they have 
developed the ability to use multiple senses to help them survive. For example, assessing the quality of 
an edible substance may not be possible using only the sense of vision; the combination of sight, touch, 
smell, and taste is far more effective. Similarly, when vision is limited by structures and vegetation, the 
sense of hearing can provide advanced warning of impending dangers. Thus, multisensory data fusion 
is naturally performed by animals and humans to assess more accurately the surrounding environment 
and to identify threats, thereby improving their chances of survival. 

While the concept of data fusion is not new, the emergence of new sensors, advanced processing 
techniques, and improved processing hardware have made real-time fusion of data increasingly viable. 
Just as the advent of symbolic processing computers (e.g., theSYM BOLICs computer and the Lambda 
machine) in theearly 1970s provided an impetusto artificial intelligence, recent advances in computing 
and sensing have provided the capability to emulate, in hardware and software, the natural data fusion 
capabilities of humansand animals. Currently, data fusion systems are used extensively for target tracking, 
automated identification of targets, and limited automated reasoning applications. Data fusion technol- 
ogy has rapidly advanced from a loose collection of related techniques to an emerging true engineering 
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discipline with standardized terminology, collections of robust mathematical techniques, and established 
system design principles. 

Applications for multisensor data fusion are widespread. Military applications include automated 
target recognition (e.g., for smart weapons), guidance for autonomous vehicles, remote sensing, battle- 
field surveillance, and automated threat recognition systems, such as identification-friend-foe-neutral 
(IFFN) systems. Nonmilitary applications include monitoring of manufacturing processes, condition- 
based maintenance of complex machinery, robotics, and medical applications. 

Techniques to combine or fuse data are drawn from a diverse set of more traditional disciplines, 
including digital signal processing, statistical estimation, control theory, artificial intelligence, and classic 
numerical methods. FI istori cal ly, data fusion methods were developed primarily for military applications. 
Flowever, in recent years, these methods have been applied to civilian applications and a bidirectional 
transfer of technology has begun. 

1.2 M ul ti sensor Advantages 

Fused data from multiple sensors provides several advantages over data from a single sensor. First, if 
several identical sensors are used (e.g., identical radars tracking a moving object), combining the obser- 
vations will result in an improved estimate of the target position and velocity. A statistical advantage is 
gained by adding theN independent observations (e.g., the estimate of the target location or velocity is 
improved by a factor proportional to N t), assuming the data are combined in an optimal manner. This 
same result could also be obtained by combining N observations from an individual sensor. 

A second advantage involves using the relative placement or motion of multiple sensors to improve 
the observation process. For example, two sensors that measure angular directions to an object can be 
coordinated to determine the position of an object by triangulation. This technique is used in surveying 
and for commercial navigation. Similarly, the use of two sensors, one moving in a known way with 
respect to another, can be used to measure instantaneously an object’s position and velocity with respect 
to the observing sensors. 

A third advantage gained by using multiplesensorsisimproved observability. Broadening the baseline 
of physical observables can result in significant improvements. Figure 1.1 provides a simple example of 
a moving object, such as an aircraft, that is observed by both a pulsed radar and a forward-looking 
infrared (FUR) imaging sensor. The radar can accurately determinethe aircraft’s range but has a limited 
ability to determine the angular direction of the aircraft. By contrast, the infrared imaging sensor can 
accurately determine the aircraft’s angular direction but cannot measure range. If these two observations 
are correctly associated (as shown in Figure 1.1), the combination of the two sensors provides a better 




FIGURE LI A moving object observed by both a pulsed radar and an infrared imaging sensor. 
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TABLE LI Representative Data Fusion Applications for Defense Systems 



Specific Applications 


1 nferences Sought by D ata 
Fusion Process 


Primary Observable 
Data 


Surveillance 

Volume 


Sensor 

Platforms 


Ocean surveillance 


Detection, tracking, 
identification of targets 
and events 


EM signals 
Acoustic signals 
Nuclear- related 
Derived observations 


Hundreds of 
nautical miles 
A ir/surf ace/ sub- 
surface 


Ships 
Aircraft 
Submarines 
Ground-based 
Ocean- based 


Air-to-air and surface- 
to-air defense 


Detection, tracking, 
identification of aircraft 


EM radiation 


Hundreds of miles 
(strategic) 

M iles (tactical) 


Ground-based 

Aircraft 


Battlefield intelligence, 
surveillance, and 
target acquisition 


Detection and 
identification of potential 
ground targets 


EM radiation 


Tens of hundreds 
of miles about a 
battlefield 


Ground-based 

Aircraft 


Strategic warning and 
defense 


Detection of indications of 
impending strategic 
actions 

Detection and tracking of 
ballistic missiles and 
warheads 


EM radiation 
Nuclear- related 


Global 


Satellites 

Aircraft 



determination of location than could be obtained by either of the two independent sensors. This results 
in a reduced error region, as shown in thefused or combined location estimate. A similar effect may be 
obtained in determining the identity of an object based on observations of an object's attributes. For 
exam p I e, th ere i s evi den ce th at bats i d en ti fy th ei r p rey by a co m b i n ati o n of facto rs, i n cl u d i n g si ze, textu re 
(based on acoustic signature), and kinematic behavior. 

1.3 Military Applications 

The Department of Defense (DoD) community focuses on problems involving the location, character- 
ization, and identification of dynamic entities such as emitters, platforms, weapons, and military units. 
These dynamic data are often termed an order-of-battle database or order-of- battle display (if superim- 
posed on a map display). Beyond achieving an order-of-battle database, DoD users seek higher-level 
inferences about the enemy situation (e.g., the relationships among entities and their relationships with 
the environment and higher level enemy organizations). Examples of DoD-related applications include 
ocean surveillance, air-to-air defense, battlefield intelligence, surveillance and target acquisition, and 
strategic warning and defense. Each of these military applications involves a particular focus, a sensor 
suite, a desired set of inferences, and a unique set of challenges, as shown in Table 1.1. 

Ocean surveillance systems are designed to detect, track, and identify ocean-based targets and events. 
Examplesincludeantisubmarinewarfaresystemsto support Navy tactical fleet operations and automated 
systems to guide autonomous vehicles. Sensor suites can include radar, sonar, electronic intelligence 
(ELI NT), observation of communications traffic, infrared, and synthetic aperture radar (SAR) observa- 
tions. The surveillance volume for ocean surveillance may encompass hundreds of nautical miles and 
focus on air, surface, and subsurface targets. M ultiple surveillance platforms can be involved and numer- 
ous targets can be tracked. Challenges to ocean surveillance involve the large surveillance volume, the 
combination of targets and sensors, and the complex signal propagation environment — especially for 
underwater sonar sensing. An example of an ocean surveillance system is shown in Figure 1.2. 

Air-to-air and surface-to-air defense systems have been developed by the military to detect, track, and 
identify aircraft and anti-aircraft weapons and sensors. These defense systems use sensors such as radar, 
passive electronic support measures (ESM ), infrared identification-friend-foe (IFF) sensors, electro-optic 
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FIGURE L2 An example of an ocean surveillance system. 

image sensors, and visual (human) sightings. These systems support counter-air, order-of-battle aggre- 
gation, assignment of aircraft to raids, target prioritization, route planning, and other activities. Chal- 
lenges to these data fusion systems include enemy countermeasures, the need for rapid decision making, 
and potentially large combinations of target-sensor pairings. A special challenge for IFF systems is the 
need to confidently and non-cooperatively identify enemy aircraft. The proliferation of weapon systems 
throughout the world has resulted in little correlation between the national origin of a weapon and the 
combatants who use the weapon. 

Battlefield intelligence, surveillance, and target acquisition systems attempt to detect and identify 
potential ground targets. Examples include the location of land mines and automatic target recognition. 
SensorsindudeairbornesurveillanceviaSAR, passive electronic support measures, photo reconnaissance, 
ground-based acoustic sensors, remotely piloted vehicles, electro-optic sensors, and infrared sensors. Key 
inferences sought are information to support battlefield situation assessment and threat assessment. 

1.4 Nonmilitary Applications 

A second broad group addressing data fusion problems are the academic, commercial, and industrial 
communities. They address problems such as the implementation of robotics, automated control of 
industrial manufacturing systems, development of smart buildings, and medical applications. As with 
military applications, each of these applications has a particular set of challenges and sensor suites, and 
a specific implementation environment (seeTable 1.2). 

Remote sensing systems have been developed to identify and locate entities and objects. Examples 
indudesystemsto monitor agricultural resources (e.g., to monitor the productivity and health of crops), 
locate natural resources, and monitor weather and natural disasters. These systems rely primarily on 
image systems using multispectral sensors. Such processing systems are dominated by automatic image 
processing. Multispectral imagery — such astheLandsatsatellitesystemandtheSPOT system— isused. 
A technique frequently used for multi sensor image fusion involves adaptive neural networks. Multi- image 
data are processed on a pixel- by- pixel basis and input to a neural network to classify automatically the 
contents of the image. False colors are usually associated with types of crops, vegetation, or classes of 
objects. Fluman analysts can readily interpret the resulting false color synthetic image. 

A key challenge in multi-image data fusion is coregistration. This problem requires the alignment of 
two or more photos so that the images are overlaid in such a way that corresponding picture elements 



©2001 CRC Press LLC 



TABLE L2 Representative Nondefense Data Fusion Applications 



Specific 

Applications 


Inferences Sought by 
Data Fusion Process 


Primary Observable Data 


Surveillance 

Volume 


Sensor Platforms 


Condition-based 

maintenance 


Detection, 
characterization of 
system faults 
Recommendations for 
maintenance/ 
corrections 


EM signals 
Acoustic signals 
Magnetic 
Temperatures 
X-rays 


M icroscopic to 
hundreds of feet 


Ships 

Aircraft 

Ground-based (e.g., 
factories) 


Robotics 


Object 

location/recognition 
Guide the locomotion 
of robot (e.g. /'hands’’ 
and "feet") 


Television 
Acoustic signals 
EM signals 
X-rays 


M icroscopictotens 
of feet about the 
robot 


Robot body 


Medical 

diagnoses 


Location/identification 
of tumors, 
abnormalities, and 
disease 


X-rays 

NMR 

Temperature 

IR 

Visual inspection 
Chemical and biological 
data 


Human body 
volume 


Laboratory 


Environmental 

monitoring 


Identification/location 
of natural phenomena 
(e.g., earthquakes, 
weather) 


SAR 
Seismic 
EM radiation 
Coresamples 
Chemical and biological 
data 


Hundreds of miles 
M iles (site 
monitoring) 


Satellites 

Aircraft 

Ground-based 

Underground 

samples 



(pixels) on each picture represent thesame location on earth (i .e., each pixel represents thesamedirection 
from an observer’s point of view). This coregistration problem is exacerbated by the fact that image 
sensors are nonlinear and perform a complex transformation between the observed three-dimensional 
space and a two-dimensional image. 

A second application area, which spans both military and nonmilitary users, is the monitoring of 
complex mechanical equipment such as turbo machinery, helicopter gear trains, or industrial manufac- 
turing equipment. For a drivetrain application, for example, sensor data can be obtained from acceler- 
ometers, temperature gauges, oil debrismonitors, acoustic sensors, and infrared measurements. An online 
condition-monitoring system would seek to combine these observations in order to identify precursors 
to failure, such as abnormal gear wear, shaft misalignment, or bearing failure. The use of such condition- 
based monitoring isexpected to reduce maintenance costs and improve safety and reliability. Such systems 
are beginning to be developed for helicopters and other platforms (see Figure 1.3). 



1.5 Three Processing Architectures 

Three basic alternatives can be used for multisensor data: (1) direct fusion of sensor data, (2) representation 
of sensor data via feature vectors, with subsequent fusion of thefeature vectors, or (3) processing of each 
sensor to achieve high-level inferences or decisions, which are subsequently combined. Each of these 
approaches utilizes different fusion techniques as described and shown in Figures 1.4a, 1.4b, and 1.4c. 

If the multisensor data are commensurate (i.e., if the sensors are measuring the same physical phe- 
nomena, such as two visual image sensors or two acoustic sensors), then the raw sensor data can be 
directly combined. Techniques for raw data fusion typically involve classic estimation methods, such as 
Kalman filtering. Conversely, if the sensor data are noncommensurate, then the data must be fused at 
the feature/state vector level or decision level. 
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FIGURE L3 Mechanical diagnostic testbed used byThePennsylvaniaStateUniversityto perform condition-based 
maintenance research. 



Feature-level fusion involves the extraction of representative features from sensor data. An example 
of feature extraction is the cartoonist's use of key facial characteristics to represent the human face. This 
technique — which is popular among political satirists — uses key features to evoke recognition of 
famous figures. Evidence confirms that humans utilize a feature- based cognitive function to recognize 
objects. In the case of multisensor feature-level fusion, features are extracted from multiple sensor 
observations and combined into a single concatenated feature vector that is input to pattern recognition 
techniques such as neural networks, clustering algorithms, or template methods. 

Decision-level fusion combines sensor information after each sensor has made a preliminary deter- 
mination of an entity’s location, attributes, and identity. Examples of decision-level fusion methods 
include weighted decision methods (voting techniques), classical inference, Bayesian inference, and 
Dempster- Shafer's method. 



1.6 A Data Fusion Process Model 



One of the historical barriers to technology transfer in data fusion has been the lack of a unifying 
terminology that crosses application-specific boundaries. Even within military applications, related but 
distinct applications — such as IFF, battlefield surveillance, and automatic target recognition — used 
different definitions for fundamental terms, such as correlation and data fusion. To improve communi- 
cations among military researchers and system developers, thejoint Directors of Laboratories (JDL) Data 
Fusion Working Group, established in 1986, began an effort to codify the terminology related to data 
fusion. The result of that effort was the creation of a process model for data fusion and a data fusion 
lexicon, shown in Figure 1.5. The JDL process model, which is intended to be very general and useful 
acrossmultipleapplication areas, identifies the processes, functions, categoriesof techniques, and specific 
techniques applicable to data fusion. The model is a two-layer hierarchy. At the top level, shown in 
Figure 1.5, the data fusion process is conceptualized by sensor inputs, human-computer interaction, 
database management, source preprocessing, and four key subprocesses: 
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(a) 




(b) 




(c) 




FIGURE L4 (a) Direct fusion of sensor data, (b) Representation of sensor data via feature vectors and subsequent 
fusion of the feature vectors, (c) Processing of each sensor to achieve high-level inferences or decisions that are 
subsequently combined. 
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FIGURE 15 Joint Directors of Laboratories (JDL) process model for data fusion. 

Level 1 processing (Object Refinement) is aimed at combining sensor data to obtain the most reliable 
and accurate estimate of an entity’s position, velocity, attributes, and identity; 

Level 2 processing (Situation Refinement) dynamically attempts to develop a description of current 
relationships among entities and events in the context of their environment; 

Level 3 processing (Threat Refinement) projects thecurrent situation into thefuture to draw inferences 
about enemy threats, friend and foe vulnerabilities, and opportunities for operations; 

Level 4 processing (Process Refinement) isa meta-process that monitors theoverall data fusion process 
to assess and improve real-time system performance. 

For each of these subprocesses, the hierarchical JDL model identifies specific functions and categories of 
techniques (in the model’s second layer) and specific techniques (in the model's lowest layer). Imple- 
mentation of data fusion systems i ntegrates and interleaves thesefunctionsinto an overall processingflow. 

The data fusion process model is augmented by a hierarchical taxonomy that identifies categories of 
techniques and algorithms for performing the identified functions. An associated lexicon has been 
developed to provide a consistent definition of data fusion terminology. TheJDL model is described in 
more detail in Chapter 2. 

1.7 Assessment of the State of the Art 



Thetechnologyof multisensor data fusion israpidlyevolving.Thereismuch concurrent ongoing research 
to develop new algorithms, to improve existing algorithms, and to assemble these techniques into an 
overall architecture capable of addressing diverse data fusion applications. 

The most mature area of data fusion process is Level 1 processing — using multisensor data to 
determine the position, velocity, attributes, and identity of individual objects or entities. Determining 
the position and velocity of an object based on multiple sensor observations isa relatively old problem. 
Gauss and Legendre developed the method of least squares for determining the orbits of asteroids. 1 
Numerous mathematical techniques exist for performing coordinate transformations, associating obser- 
vations to observations or to tracks, and estimating the position and velocity of a target. M ultisensor 
target tracking isdominated by sequential estimation techniques such astheKalman filter. Challenges in 
this area involve circumstances in which there is a dense target environment, rapidly maneuvering targets, 
or complex signal propagation environments (e.g., involving multipath propagation, cochannel interfer- 
ence, or clutter). However, single-target tracking in excellent signal-to-noise environments for dynami- 
cally well-behaved (i.e., dynamically predictable) targets is a straightforward, easily resolved problem. 
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Current research focuseson solvingtheassignmentand maneuvering target problem. Techniques such 
as multiple-hypothesistracking (M HT), probabilistic data association methods, random set theory, and 
multiple criteria optimization theory are being used to resolve these issues. Some researchers are utilizing 
multipletechniques simultaneously, guided by a knowledge- based system capableof selecting the appro- 
priate solution based on algorithm performance. 

A special problem in Level 1 processing involves the automatic identification of targets based on 
observed characteristics or attributes. To date, object recognition has been dominated by feature- based 
methods in which a feature vector (i.e., a representation of the sensor data) is mapped into feature space 
with the hope of identifying the target based on the location of the feature vector relative to a priori 
determined decision boundaries. Popular pattern recognition techniques include neural networks and 
statistical classifiers. Although numerous techniques are available, the ultimate success of these methods 
relies on the selection of good features. (Good features provide excel lent class separability in feature space; 
bad features result in greatly overlapping feature space areas for several classes of target.) M ore research 
isneeded in thisareato guide theselection of features and to incorporate explicit knowledge about target 
classes. For example, syntactic methods provide additional information about the makeup of a target. In 
addition, some limited research is proceeding to incorporate contextual information — such as target 
mobility with respect to terrain — to assist in target identification. 

Level 2 and Level 3 fusion (situation refinement and threat refinement) are currently dominated by 
knowledge-based methods such as rule-based blackboard systems. These areas are relatively immature 
and have numerous prototypes, but few robust, operational systems. The main challenge in this area is 
to establish a viable knowledge base of rules, frames, scripts, or other methods to represent knowledge 
about situation assessment or threat assessment. Unfortunately, only very primitive cognitive models 
exist to replicate the human performance of these functions. Much research isneeded before reliable and 
large-scale knowledge-based systems can be developed for automated situation assessment and threat 
assessment. New approaches that offer promise are the use of fuzzy logic and hybrid architectures, which 
extend the concept of blackboard systems to hierarchical and multitime scale orientations. 

Finally, Level 4 processing, which assesses and improves the performance and operation of an ongoing 
data fusion process, has a mixed maturity. For single sensor operations, techniques from operations 
research and control theory have been applied to develop effective systems, even for complex single 
sensors such asphased array radars. In contrast, situationsthat involve multi pie sensors, external mission 
constraints, dynamic observing environments, and multiple targets are more challenging. To date, con- 
siderable difficulty has been encountered in attemptingto model and incorporate mission objectives and 
constraints to balance optimized performance with limited resources, such as computing power and 
communication bandwidth (e.g., between sensors and processors), and other effects. Methodsfrom utility 
theory are being applied to develop measures of system performance and measures of effectiveness. 
Knowledge- based systems are being developed for context-based approximate reasoning. Significant 
improvements will result from the advent of smart, self- calibrating sensors, which can accurately and 
dynamically assess their own performance. 

Data fusion has suffered from a lack of rigor with regard to the test and evaluation of algorithms and 
the means of transitioning research findings from theory to application. The data fusion community 
must insist on high standards for algorithm development, test, and evaluation; creation of standard test 
cases; and systematic evolution of the technology to meet realistic applications. On a positive note, the 
introduction of theJDL process model and emerging nonmilitary applications are expected to result in 
increased crossdisciplinecommunication and research. The nonmilitary research in robotics, condition- 
based maintenance, industrial process control, transportation, and intelligent buildings will produce 
innovations that will cross-fertilize the entire field of data fusion technology. The many challenges and 
opportunities related to data fusion establish it as an exciting research field with numerous applications. 
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1.8 Additional Information 



Additional information about multisensor data fusion may be found in the following references: 

• D. L. Hall, Mathematical Techniques in Multisensor Data Fusion, Artech House, Inc. (1992) — 
provides details on the mathematical and heuristic techniques for data fusion 

• E. Waltz and J. Hinas, M ultisensor Data Fusion, Artech House, Inc. (1990) — presents an excellent 
overview of data fusion especially for military applications 

• L. A. Klein, Sensor and Data Fusion Concepts and Applications, SPIE Optical Engineering Press, 
Volume TT 14 (1993) — presents an abbreviated introduction to data fusion 

• R. Antony, Principles of Data Fusion Automation, Artech House, Inc. (1995) — provides a discus- 
sion of data fusion processes with special focus on database issues to achieve computational 
efficiency 

• A multimedia computer-based training package, "Introduction to Data Fusion, A multimedia 
computer-based training package" — availablefrom Artech House, Inc., Boston, MA, 1995. 

• A data fusion lexicon is available from TECH REACH Inc. at http://www.techreachinc.com. 

Reference 

1. Sorenson, H.W., Least- squares estimation: from Gauss to Kalman, IEEE SPECTRU M , July 1970, 
63-68. 
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2.1 Introduction 



The data fusion model, developed in 1985 by the U.S. Joint Directors of Laboratories (JDL) Data Fusion 
Group*, with subsequent revisions, is the most widely used system for categorizing data fusion-related 
functions. The goal of the JDL Data Fusion Model is to facilitate understanding and communication 
among acquisition managers, theoreticians, designers, evaluators, and users of data fusion techniques to 
permit cost-effect system design, development, and operation. 1 ' 2 

This chapter discusses the most recent model revision (1998): its purpose, content, application, and 
relation to other models. 3 



2.2 What Is Data Fusion? What Isn't? 



2.2.1 The Role of Data Fusion 

Often, the role of data fusion has been unduly restricted to a subset of the relevant processes. Unfortu- 
nately, the universality of data fusion has engendered a profusion of overlapping research and develop- 
ment in many applications. A jumble of confusing terminology (illustrated in Figure 2.1) and ad hoc 
methods in a variety of scientific, engineering, management, and educational disciplines obscures the 
fact that the same ground has been plowed repeatedly. 



*Now recharted as the Data and Information Fusion Group within the Deputy Director for Research and Engi- 
neering’s Information System Technology Panel at the U.S. Department of Defense. 
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FIGURE 2.1 (Con)fusion of terminology. 

Often, the role of data fusion has been unduly restricted to a subset of processes and its relevancy has 
been limited to particular state estimation problems. For example, in military applications, such as 
targeting or tactical intelligence, the focus is on estimating and predicting the state of specific types of 
entities in the external environment (e.g., targets, threats, or military formations). In this context, the 
applicable sensors/sources that the system designer considers are often restricted to sensors that directly 
collect data from targets of interest. 

Ultimately, however, such problems are inseparable from other aspects of the system’s assessment of 
the world. In a tactical system, this will involve estimation of one’s own state in relation to the relevant 
external entities: friends, foes, neutrals, and background. Estimation of the state of targets and threats 
cannot be separated from the problems of estimating one’s own location and motion, of calibrating one’s 
sensor performance and alignment, and of validating one’s library of target sensor and environment 
models. The data fusion problem, then, becomes that of achieving a consistent, comprehensive estimate 
and prediction of some relevant portion of the world state. In such a view, data fusion involves exploiting 
all sources of data to solve all relevant state estimation/prediction problems, where relevance is determined 
by utility in forming plans of action. 

The data fusion problem, therefore, encompasses a number of interrelated problems: estimation and 
prediction of states of entities both external and internal to the acting system, and the interrelations 
among such entities. Evaluating the system’s models of the characteristics and behavior of all of these 
external and organic entities is, likewise, a component of the overall problem of estimating the actual 
world state. 

Making the nontrivial assumption that the universe of discourse for a given system can be partitioned 
into an unknown but finite number of entities of interest, the problem of consistently estimating a multi- 
object world state can be defined as shown in Figure 2.2. 4 Here, x 1 ...,x k are entity states, so the global 
state estimation problem becomes one of finding the finite set of entity states X with maximum a posteriori 
likelihood. 

The complexity of the data fusion system engineering process is characterized by difficulties in 

• representing the uncertainty in observations and in models of the phenomena that generate 
observations; 

• combining noncommensurate information (e.g., the distinctive attributes in imagery, text, and 
signals); 

• maintaining and manipulating the enormous number of alternative ways of associating and 
interpreting large numbers of observations of multiple entities. 
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Find Most Likely Multiobject State: 



X = arg max J X(X)8X 



oo 

= arg max £ X({x 1 ,...,x k })dx 1 ,...,dx k 

k=0 




Deriving general principles for developing and evaluating data fusion processes — whether automatic 
or manual — will help to take advantage of the similarity in the underlying problems of data association 
and combination that span engineering, analysis, and cognitive situations. Furthermore, recognizing the 
common elements of diverse data fusion problems can provide extensive opportunities for synergistic 
development. Such synergy — enabling the development of information systems that are cost-effective 
and trustworthy — requires common performance evaluation measures, system engineering methodol- 
ogies, architecture paradigms, and multispectral models of targets and data collection systems. 

2.2.2 Definition of Data Fusion 

The initial JDL Data Fusion Lexicon defined data fusion as: 

A process dealing with the association, correlation, and combination of data and information from 
single and multiple sources to achieve refined position and identity estimates, and complete and timely 
assessments of situations and threats, and their significance. The process is characterized by continuous 
refinements of its estimates and assessments, and the evaluation of the need for additional sources, or 
modification of the process itself, to achieve improved results. 1 

As the above discussion suggests, this initial definition is rather too restrictive. A definition is needed 
that can capture the fact that similar underlying problems of data association and combination occur in 
a very wide range of engineering, analysis, and cognitive situations. In response, the initial definition 
requires a number of modifications: 

1. Although the concept combination of data encompasses the broad range of problems of interest, 
correlation does not. Statistical correlation is merely one method for generating and evaluating 
hypothesized associations among data. 

2. Association is not an essential ingredient in combining multiple pieces of data. Recent work in 
random set models of data fusion provides generalizations that allow state estimation of multiple 
targets without explicit report-to-target association. 4-6 
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3. Single or multiple sources is comprehensive; therefore, it is superfluous in a definition. 

4. The reference to position and identity estimates should be broadened to cover all varieties of state 
estimation. 

5. Complete assessments are not required in all applications; timely, being application-relative, is 
superfluous. 

6. Threat assessment limits the application to situations where threat is a factor. This description 
must also be broadened to include any assessment of the cost or utility implications of estimated 
situations. In general, data fusion involves refining and predicting the states of entities and aggre- 
gates of entities and their relation to one’s own mission plans and goals. Cost assessments can 
include variables such as the probability of surviving an estimated threat situation. 

7. Not every process of combining information involves collection management or process refine- 
ment. Thus, the definition’s second sentence is best construed as illustrative, not definitional. 

Pruning these extraneous qualifications, the model revision proposes the following concise definition 
for data fusion: 3 

Data fusion is the process of combining data or information to estimate or predict entity states. 

Data fusion involves combining data — in the broadest sense — to estimate or predict the state of 
some aspect of the universe. Often the objective is to estimate or predict the physical state of entities: 
their identity, attributes, activity, location, and motion over some past, current, or future time period. 
If the job is to estimate the state of people (or any other sentient beings), it may be important to estimate 
or predict the individuals’ and groups’ informational and perceptual states and the interaction of these 
with physical states (this point is discussed in Section 2.5). 

Arguments about whether data fusion or some other label best describes this very broad concept are 
pointless. Some people have adopted terms such as information integration in an attempt to generalize 
earlier, narrower definitions of data fusion (and, perhaps, to distance themselves from old data fusion 
approaches and programs). However, relevant research should not be neglected simply because of shifting 
terminological fashion. Although no body of common and accepted usage currently exists, this broad 
concept is an important topic for a unified theoretical approach and, therefore, deserves its own label. 

2.3 Models and Architectures 



The use of the JDL Data Fusion Model in system engineering can best be explained by considering the 
role of models in system architectures in general. According to the IEEE definition, 7 an architecture is a 
“structure of components, their relationships, and the principles and guidelines governing their design 
and evolution over time.” Architectures serve to coordinate capabilities to achieve interoperability and 
affordability. As such, general requirements for an architecture are that it must 

1. Identify a focused purpose, 

2. Facilitate user understanding/communication, 

3. Permit comparison and integration, 

4. Promote expandability, modularity, and reusability, 

5. Promote cost-effective system development, 

6. Apply to the required range of situations. 

The JDL Model has been used to develop an architecture paradigm for data fusion 8-10 (as discussed in 
Chapter 18); however, in reality, the JDL Model is merely an element of an architecture. A model is an 
abstract description of a set of functions or processes that may be components of a system of a particular 
type, without indication of software or physical implementation. That being the case, the previous list 
of architectural virtues applies, with the exception of item (1), which is relevant only to specific system 
architectures. 
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FIGURE 2.3 Revised JDL data fusion model ( 1998). 3 



The JDL Model was designed to be a functional model — a set of definitions of the functions that 
could comprise any data fusion system. Distinguishing functional models from process models and other 
kinds of models is important. Process models specify the interaction among functions within a system. 
Examples of process models include Boyd’s Observe, Orient, Decide and Act (OODA) loop, the Predict, 
Extract, Match and Search (PEMS) loop, and the UK Intelligence cycle and waterfall process models cited 
by Bedworth and O’Brien. 11 

Another type of model is a formal model, constituting a set of axioms and rules for manipulating 
entities. Examples are probabilistic, possibilistic, and evidential reasoning frameworks. * 

A model should clarify the elements of problems and solutions to facilitate recognition of common- 
alities in problems and in solutions. Among questions that a model should help answer are the following: 

• Idas the problem been solved before? 

• Idas the same problem appeared in a different form and is there an existing solution? 

• Is there a related problem with similar constraints? 

• Is there a related problem with the same unknowns? 

• Can the problem be subdivided into parts that are easier to solve? 

• Can the constraints be relaxed to transform the problem into a familiar one? 12 

2.3.1 Data Fusion "Levels" 

Of the many ways to differentiate types of data fusion functions, the JDL model has gained the widest 
usage. The JDL model’s differentiation of functions into fusion levels (depicted in Figure 2.3) provides 
a useful distinction among data fusion processes that relate to the refinement of “objects,” “situations,” 
“threats,” and “processes.” 2 



* This is seen as equivalent to the concept of framework as used in Reference 11. 
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TABLE 2.1 Characterization of the Revised Data Fusion Levels 



Data Fusion Level 


Association 

Process 


Estimation 

Process 


Entity 

Estimated 


L.O — Sub-Object Assessment 


Assignment 


Detection 


Signal 


L.l — Object Assessment 


Attribution 


Individual Object 


L.2 — Situation Assessment 


Aggregation 


Relation 


Aggregation (Situation) 


L.3 — Impact Assessment 


Plan Interaction 


Effect (situation, given plans) 


L.4 — Process Refinement 


Planning 


(Control) 


(Action)* 



* Process Refinement does not involve estimation, but rather control. Therefore, its product is a 
control sequence, which — by the duality of estimation and control — relates to a controlled entity’s 
actions as an estimate relates to an actual state. 15 



Nonetheless, several concerns must be raised with regard to the ways in which these JDL data fusion 
levels have been used in practice: 

• The JDL levels have frequently been misinterpreted as specifying a process model (i.e., as a canonical 
guide for process flow within a system — “perform Level 1 fusion first, then Levels 2, 3, and 4...). 

• The original JDL model names and definitions (e.g., “threat refinement”) seem to focus on tactical 
military applications, so that the extension of the concepts to other applications is not obvious. 

• For these and other reasons, the literature is rife with diverse interpretations of the data fusion 
levels. The levels have been interpreted as distinguishing any of the following: (a) the kinds of 
association and/or characterization processing involved, (b) the kinds of entities being character- 
ized, and (c) the degree to which the data used in the characterization has already been processed. 

The objectives in the 1998 revision of the definitions for the levels are (a) to provide a useful catego- 
rization representing logically different types of problems, which are generally (though not necessarily) 
solved by different techniques and (b) to maintain a degree of consistency with regard to terminology. 
The former is a matter of engineering; the latter is a language issue. 

Figure 2.3 shows the suggested revised model. The proposed new definitions are as follows: 

• Level 0 — Sub-Object Data Assessment: estimation and prediction of signal- or object-observable 
states on the basis of pixel/signal-level data association and characterization. 

• Level 1 — Object Assessment: estimation and prediction of entity states on the basis of inferences 
from observations. 

• Level 2 — Situation Assessment: estimation and prediction of entity states on the basis of inferred 
relations among entities. 

• Level 3 — Impact Assessment: estimation and prediction of effects on situations of planned or 
estimated/predicted actions by the participants (e.g., assessing susceptibilities and vulnerabilities 
to estimated/predicted threat actions, given one’s own planned actions). 

• Level 4 — Process Refinement (an element of Resource Management): adaptive data acquisition 
and processing to support mission objectives. 

Table 2.1 provides a general characterization of these concepts. Note that the levels are differentiated 
first on the basis of types of estimation process, which roughly correspond to the types of entity for 
which state is estimated. 

2.3.2 Association and Estimation 

In the common cases where the fusion process involves explicit association in performing state estimates, 
a corresponding distinction is made among the types of association processes. Figure 2.4 depicts assign- 
ment matrices that are typically formed in each of these processing levels. The examples have the form 
of two-dimensional matrices, as commonly used in associating reports to tracks. 
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FIGURE 2.4 Assignment matrices for various data fusion “levels.” 



Level 0 association involves hypothesizing the presence of a signal (i.e., of a common source of sensed 
energy) and estimating its state. Level 0 associations can include (a) signal detection obtained by inte- 
grating a time series of data (e.g., the output of an analog-to-digital converter) and (b) feature extraction 
from a region in imagery. In this case, a region could correspond to a cluster of closely spaced objects, 
or to part of an object, or simply to a differentiable spatio-temporal region. 

Level 1 association involves selecting observation reports (or tracks from prior fusion nodes in a 
processing sequence) for inclusion in a track. Such a track is a hypothesis that a certain set of reports is 
the total set of reports available to the system referencing some individual entity. Global Level 1 hypotheses 
map the set of observations available to the system to tracks. For systems in which observations are 
assumed to be associated with only one track, this is a set-partitioning problem; more generally, it is a 
set-covering problem. 

Level 2 association involves associating tracks (i.e., hypothesized entities) into aggregations. The state 
of the aggregate entity is represented as a network of relations among aggregation elements. Any variety 
of relations — physical, organizational, informational, and perceptual — can be considered, as appro- 
priate to the given information system’s mission. As the class of estimated relationships and the numbers 
of interrelated entities broaden, the term situation is used to refer to an aggregate object of estimation. 
A model for such development is presented by Steinberg and Washburn. 14 

Level 3 association is usually implemented as a prediction, drawing particular kinds of inferences from 
Level 2 associations. Level 3 fusion estimates the impact of an assessed situation (i.e., the outcome of 
various plans as they interact with one another and with the environment). The impact estimate can 
include likelihood and cost/utility measures associated with potential outcomes of a player’s planned 
actions. 

Because Level 2 has been defined so broadly, Level 3 is actually a subset of Level 2. Whereas Level 2 
involves estimating or predicting all types of relational states, Level 3 involves predicting some of the 
relationships between a specific player and his environment, including interaction with other players’ 
actions, given the player’s action plan and that of every other player. More succinctly, Level 2 concerns 
relations in general: paradigmatically third-person, objective relations. Level 3 concerns first-person 
relations — involving the system or its user — with an attendant sense of subjective utility. 

Level 4 processing involves planning and control, not estimation. As discussed by Bowman, 15 just as 
a formal duality exists between estimation and control, there is a similar duality between association and 
planning. Therefore, Level 4 association involves assigning resources to tasks. 
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2.3.3 Context Sensitivity and Situation Awareness 

Once again, the JDL model is a functional model, not a process model. Therefore, it would be a mistake 
to assume that the information flow in data fusion must proceed strictly from Level 1 to Level 2 to Level 3. 
Such a mistake has, unfortunately, been common with system designers. A “bottom-up” fusion process 
is justified only under the following conditions: 

• Sensor observations can be partitioned into measurements, each of which originates from, at most, 
one real entity. 

• All information relevant to the estimation of an entity state is contained in the measurement of 
the individual entity. 

Neither of these conditions is necessarily true, and the second is usually false. 

The value of estimating entity states on the basis of context is becoming increasingly apparent. A 
system that integrates data association and estimation processes of all “levels” will permit entities to be 
understood as parts of complex situations. A relational analysis, as illustrated in Figure 2.5, permits 
evidence applicable to a local estimation problem to be propagated through a complex relational network. 

Note that inferencing based on hypothesized relationships among entities can occur within and 
between all of the data fusion levels. Figure 2.6 depicts typical information flow across the data fusion 
levels. Level 0 functions combine measurements to generate estimates of signals or features. At Level 1, 
signal/feature reports are combined to estimate the states of objects. These are combined, in turn, at 
Level 2 to estimate situations (i.e., states of aggregate entities). Level 3, according to this logical relation- 
ship, seems to be out of numerical sequence. It is a “higher” function than the planning function of 
Level 4. Indeed, Process Refinement (Level 4) processes can interact with association/estimation data 
fusion processes in a variety of ways, managing the operation of individual fusion nodes or that of larger 
ensembles of such nodes. The figure reinforces the point that the data fusion levels are not to be 
taken as a prescription for the sequencing of a system’s process flow. Processing partitioning and 
flow must be designed in terms of the individual system requirements, as discussed in Chapter 16. 

2.3.4 Attributive and Relational Functions 

Table 2. 1 shows that association within Levels 0 and 1 involves assignment, while Levels 2 and 3 association 
involves aggregation. This can be modeled as the distinction between 
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FIGURE 2.6 Characteristic data flow among the “levels.” 

• estimation on the basis of observations: (x\Z) or (X| Z) for entity or world states, given a set of 
observations, Z, and 

• estimation on the basis of inferred relations among entities: (x|R) or (X|R), where R is a set of 
ordered n-tuples <x 1 ,...,x n _ 1 ,r>, the x, being entity states and r a relational state 

Figure 2.5 provides an example of the relationship of Level 1 and 2 hypotheses. A Level 2 hypothesis 
can be modeled as a directed graph, the nodes of which may correspond to entity tracks and, therefore, 
to Level 1 hypotheses. More precisely, a node in a Level 2 hypothesis corresponds to a perceived entity. 
The set of observations associated directly with that node can be considered to be a Level 1 hypothesis 
imbedded in the Level 2 structure. Of course, entities can be inferred from their context alone, without 
having been observed directly. For example, in the SA-6 battery of Figure 2.6, the estimation of the presence 
of launchers at three corners of a diamond pattern may support the inference of a fourth launcher in the 
remaining corner. The figure further illustrates the point that hypotheses regarding physical objects (e.g., 
the mobile missile launcher at the lower right of Figure 2.5) may themselves be Level 2 relational constructs. 

2.3.4. 1 Types of Relationships 

Assembling an exhaustive list of relationships of interest is impossible, which is one reason that Level 2 
fusion (Situation Assessment) is generally more difficult than Level 1 fusion. Level 2 problems are 
generally more difficult than Level 1 problems. The process model for aggregate entities — particularly 
those involving human activity — is often poorly understood, being less directly inferable from underlying 
physics than Level 1 observable attributes. For this reason, automation of Situational Awareness has relied 
on so-called cognitive techniques that are intended to copy the inference process of human analysts. 
However, knowledge extraction is a notoriously difficult undertaking. Furthermore, Level 2 problems 
often involve a much higher dimensionality, corresponding to the relations that may be part of an 
inference. Finally, no general metric exists for assessing the relevance of data in these unspecified, high- 
dimension spaces, unlike the simple distance metrics commonly used for Level 1 validation gating. 
Relationships of interest to particular context exploitation or situation awareness concerns can include: 

• Spatio-temporal relationships; 

• Part/whole relationships; 
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FIGURE 2.7 Attributive and relational state example. 

• Organizational relationships (e.g., X is a subordinate unit to Y) and roles (e.g., X is the unit 
commander, company clerk, CEO, king, or court jester of Y); 

• Various causal relations, whereby X changes the state of Y: 

- Physical state (damaging, destroying, moving, invading, repairing) 

- Informational state (communicating, informing, revealing) 

- Perceptual or other mental state (persuading, deceiving, intimidating) 

- Financial or legal state (paying, fining, authorizing, forbidding, sentencing) 

- Intentional relationships, whereby X wishes to change the state of Y (targeting, jamming, 
cajoling, lying to); 

• Semantic relationships (X is of the same type as Y); 

• Similarity relationships (X is taller than 7); 

• Legal relationships (X owns Y, X leases Y to Z); 

• Emotional relationships (love, hate, fear); 

• Biological relationships (kinship, ethnicity). 

2.3.4. 2 Attributive and Relational Inferencing Example 

Figure 2.7 provides an example of the attributive and relational states within and among the elements 
of an aggregate entity. Steinberg and Washburn 14 discuss formal methods for inferring relational states 
to refine entity-level and aggregate-level state estimates. A Bayesian network technique is used to combine 

• the estimate of an entity state, X ; , based on a set of observations, Z ; , in a Level 1 hypothesis (track) 
and 

• the estimate of an entity state, X ; , based on a set of relations, _R ; , among nodes (tracks) in a Level 
2 hypothesis (aggregation). 

The distribution of discrete states, x d , for X, given its assignment to the given node in a Level 2 hypothesis, 
£, will be determined by this “evidence” from each of these sources: 
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pL2^ X d^ 



( 2 . 1 ) 



x i 

where p Ll (x d ) is the probability currently assigned to discrete state, x d , by Level 1 data fusion of obser- 
vations associated with node X, and A(x d ) is the evidence communicated to X from the tracks related to 
Y in a Level 2 association hypothesis. 

The evidence from the nodes communicating with X will be the product of evidence from each such 
node Y. 



^[ x d)~ J[ | ^y(-’ c rf) (2-2) 

{x.Tjet 

The factors A Y (x d ) are interpreted in terms of relational states among entities as follows. Ordered pairs 
of entities are hypothesized as having relational states, r t (X,Y). A given track, Y, may be involved in several 
competing relations relative to X with probability distributions p[ri(X,Y)].* 

Updating a track, Y, contributes information for evaluating the probability of each state, x, of a possible 
related entity, X. As with attributive states, relational states, r, can be decomposed into discrete and 
continuous components, r d and r c (as exemplified in Figure 2.6). Then this contextual evidence is given by 

A Y{ x d)=^pL l {yd)p[yi x d]=\Y, pL ^)p[yi rXd ]p[ r \ x d] dr 

yd yd 

(2.3) 

= ^PnW Jp[ r J r ,»>*J dr 

yd r d 

Inferences can be drawn about a hypothesized entity denoted by track X it given the Level 2 hypothesis 
that the entity corresponding to X i stands in a particular relationship to another hypothesized entity 
corresponding to a track X ( . In the example shown in Figure 2.8 (based on sets of relationships as 
illustrated in Figure 2.7), it is assumed that an entity — elliptically referred to as Xj — has been estimated 
tohaveprobabilitiesp(Xj) of being an entity of types and activity states x 1 on the basis of Level 1 association 
of sensor reports z 1 and z 2 . Then, if X 2 and X 2 meet the criteria of particular relationships for any states 
Xj and x 2 of X 2 and X 2 , respectively, inferences can be drawn regarding the probabilities as to the type 
and activity of X 2 . 

For example, given the estimate that A 1 and X 2 stand in certain spatio-temporal and other relationships, 
as listed in Figure 2.7, there is a mutual reinforcement of pairs of Level 1 state estimates <x v x 2 > that are 
consistent with this relationship (e.g., that X 3 is a Straight Flush radar and X 2 is an SA-6 surface-to-air 
missile battery) and suppression of nonconsistent state pairs. Conditioned on this association, the esti- 
mate of the likelihood of track X 2 can be refined (i.e., the hypothesis that the associated observations — 
z 3 in Figure 2.8 — relate to the same entity). Furthermore, likelihood and state estimates to other nodes 
adjoining X 2 can be further propagated (e.g., to infer the battery-association and the type and activity 
of a missile launcher, X 3 , hypothesized on the basis of observations z 4 and z 5 ) . As noted above, the presence, 
identity, and activity state of entities that have not been observed can be inferred (e.g., the presence of 



* For simplicity, the present discussion is limited to binary relations. In cases where more complex relations are 
relevant, a second order can be employed, whereby entities can have binary links to nodes representing n-ary 
relations. 16 
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FIGURE 2.8 Attributive and relational inferencing example. 

a full complement of launchers and other associated equipment can be inferred, conditioned on the 
assessed presence of an SA-6 battery). 

Each node in a Level 2 hypothesis combines the effects of evidence from all adjacent nodes and propagates 
the updated probability distributions and likelihood (i.e., association confidence) regarding an entity state 
to the other nodes. Loops in the inference flow occur; however, methods have been defined to deal with them. 

2.3.4.3 A Generalization about the Levels 

Level 1 data fusion involves estimating and predicting the state of inferred entities based on observed 
features. Level 2 data fusion involves estimating and predicting the state of inferred entities on the basis 
of relationships to other inferred entities. Because of their reliance on these inference mechanisms, Levels 
0 and 3 are seen as special cases of Levels 1 and 2, respectively (as illustrated in Figure 2.9): 

• Level 0 is a special case of Level 1, where entities are signals/features. 

• Level 3 is a special case of Level 2, where relations are first-person relations. 

Earlier, this chapter asserted that Level 4 fusion is not fusion at all, but a species of Resource Manage- 
ment; therefore, only two super-levels of fusion remain, and these are partitioned by type of data 
association. A secondary partitioning by type of entity characterized distinguishes within these super- 
levels. Section 2.5 presents the case for an even finer partitioning within the JDL levels. 

2.4 Beyond the Physical 

In general, then, the job of data fusion is that of estimating or predicting the state of some aspect of the 
world. When that aspect includes people (or any other information systems, for that matter), it can be 
relevant to include a consideration of informational and perceptual states and their relations to physical 
states. Informational state refers to the data available to the target. Perceptual state refers to the target’s 
own estimate of the world state. 17 (See Chapter 15.) 
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FIGURE 2.9 Attributive and relational inferencing. 
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FIGURE 2.10 Entity states: three aspects. 

A person or other information system (represented by the box at the left of Figure 2.10) senses physical 
stimuli as a function of his physical state in relation to that of the stimulating physical world. These 
include both stimuli originating outside the person’s body and those originating from within. 

The person can combine multiple sensory reports to develop and refine estimates of perceived entities 
(i.e., tracks), aggregations, and impacts on his plans and goals (Levels 1-3 fusion). This ensemble of 
perceived entities and their interrelationships is part of the person’s perceptual state. As depicted in the 
figure, his perceptual state can include an estimation of physical, informational, and perceptual states 
and relations of things in the world. The person’s perceptions can be encoded symbolically for manipulation, 
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communication, or storage. The set of symbolic representations available to the person is his informational 
state. Informational state can encompass available data stores such as databases and documents. The notion 
of informational state is probably more applicable to a closed system (e.g., a nonnetworked computer) 
than to a person, for whom the availability of information is generally a matter of degree. The tripartite 
view of reality developed by Waltz 17 extends the work of philosopher Karl Popper. The status of information 
as a separable aspect of reality is certainly subject to discussion. Symbols can have both a physical and a 
perceptual aspect: they can be expressed by physical marks or sounds, but their interpretation (i.e., 
recognizing them orthographically as well as semantically) is a matter of perception. 

As seen in this example, symbol recognition (e.g., reading) is clearly a perceptual process. It is a form 
of context-sensitive model-based processing. The converse process, that of representing perceptions 
symbolically for purpose of recording or communicating them, produces a physical product — text, 
sounds, etc. Such physical products must be interpreted as symbols before their informational content 
can be accessed. Whether there is more to information than these physical and perceptual aspects remains 
to be demonstrated. Furthermore, the distinction between information and perception is not the differ- 
ence between what a person knows and what he thinks (cf. Plato’s Theatetus, in which knowledge is shown 
to involve true opinion plus some sense of understanding). Nonetheless, the notion of informational 
state is useful as a topic for estimation because knowing what information is available to an entity (e.g., 
an enemy commander’s sources of information) is an important element in estimating (and influencing) 
his perceptual state and, therefore, in predicting (and influencing) changes. 

The person acts in response to his perceptual state, thereby affecting his and the rest of the world’s 
physical state. His actions may include comparing and combining various representations of reality: his 
network of perceived entities and relationships. He may search his memory or seek more information 
from the outside. These are processes associated with data fusion Level 4. 

Other responses can include encoding perceptions in symbols for storage or communication. These 
can be incorporated in the person’s physical actions and, in turn, are potential stimuli to people (including 
the stimulator himself) and other entities in the physical world (as depicted at the right of Figure 2.10). 
Table 2.2 describes the elements of state estimation for each of the three aspects shown in Figure 2.10. 
Note the recursive reference in the bottom right cell. 

Figure 2.11 illustrates this recursive character of perception. Each decision maker interacts with every 
other one on the basis of an estimate of current, past, and future states. These include not only estimates 
of who is doing what, where, and when in the physical world, but also what their informational states 
and perceptual states are (including, “What do they think of me?”). 

If state estimation and prediction are performed by an automated system, that system may be said to 
possess physical and perceptual states, the latter containing estimates of physical, informational, and 
perceptual states of some aspects of the world. 



TABLE 2.2 Elements of State Estimation 





Attributive State 


Relational State 


Object Aspect 


Discrete 


Continuous 


Discrete 
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Physical 


Type, ID 
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Location/kinematics 
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data 
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Available data values 
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Informational relation type 
Info source/ recipient role 
allocation 


Source data quality, 
quantity, timeliness 
Output quality, quantity, 
timeliness 


Perceptual 


Goals 

Priorities 


Cost assignments 

Confidence 

Plans/schedules 


Influence relation type 
Influence source/recipient 
role allocation 


Source confidence 
World state estimates (per 
this table) 
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FIGURE 2.11 World states and nested state estimates. 

2.5 Comparison with Other Models 



2.5.1 Dasarathy's Functional Model 

Dasarathy 18 has defined a very useful categorization of data fusion functions in terms of the types of 
data/information that are processed and the types that result from the process. Table 2.3 illustrates the 
types of inputs/outputs considered. Processes corresponding to the cells in the highlighted diagonal X 
region are described by Dasarathy, using the abbreviations DAI-DAO, DAI-FEO, FEI-FEO, FEI-DEO, and 
DEI-DEO. A striking benefit of this categorization is the natural manner in which technique types can 
be mapped into it. 



TABLE 2.3 Interpretation of Dasarathy’s Data Fusion I/O Model 
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TABLE 2.4 Expansion of Dasarathy’s Model to Data Fusion Levels 0-4 
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We have augmented the categorization as shown in the remaining matrix cells by adding labels to 
these cells, relating input/output (I/O) types to process types, and filling in the unoccupied cells in the 
original matrix. 

Note that Dasarathy’s original categories represent constructive, or data-driven, processes in which 
organized information is extracted from relatively unorganized data. Additional processes — FEI-DAO, 
DEI-DAO, and DEI-FEO — can be defined that are analytic, or model-driven, such that organized 
information (a model) is analyzed to estimate lower-level data (features or measurements) as they relate 
to the model. Examples include predetection tracking (an FEI-DAO process), model-based feature- 
extraction (DEI-FEO), and model-based classification (DEI-DAO). The remaining cell in Table 2.3 — 
DAO-DEO — has not been addressed in a significant way (to the authors’ knowledge) but could involve 
the direct estimation of entity states without the intermediate step of feature extraction. 

Dasarathy’s categorization can readily be expanded to encompass Level 2, 3, and 4 processes, as shown 
in Table 2.4. Here, rows and columns have been added to correspond to the object types listed in Figure 2.4. 

Dasarathy’s categories represent a useful refinement of the JDL levels. Not only can each of the levels 
(0-4) be subdivided on the basis of input data types, but our Level 0 can also be subdivided into detection 
processes and feature-extraction processes.* 

Of course, much of Table 2.4 remains virgin territory; researchers have seriously explored only its 
northwest quadrant, with tentative forays southeast. Most likely, little utility will be found in either the 
northeast or the southwest. However, there may be gold buried somewhere in those remote stretches. 



* A Level 0 remains a relatively new concept in data fusion (although quite mature in the detection and signal 
processing communities); therefore, it hasn’t been studied to a great degree. The extension of formal data fusion 
methods into this area must evolve before the community will be ready to begin partitioning it. Encouragingly, 
Bedworth and O’Brien 11 describe a similar partitioning of Level 1 -related functions in the Boyd and UK Intelligence 
Cycle models. 
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TABLE 2.5 Bedworth and O'Brien's Comparison of Data Fusion-related Models 11 



Activity being 
undertaken 


Waterfall model 


JDL Model 


Boyd Loop 


Intelligence 

Cycle 


Command execution 






Act 


Disseminate 


Decision making process 


Decision making 


Level 4 


Decide 


Threat assessment 




Level 3 


Orient 


Evaluate 


Situation assessment 


Situation assessment 


Level 2 


Information processing 


Pattern processing 


Level 1 


Collate 


Feature extraction 


Signal processing 


Signal Processing 


Level 0 


Source/sensor acquisition 


Sensing 




Observe 


Collect 



Feature 

Fusion 



Soft Decision Fusion k 



Decision Making 



Context Processing 



Hard Decision Fusion 



DECIDE 



1 



Pattern Processing 



Feature Extraction 



ORIENT 



ACT 



Control 



Resource Tasking 



I 



OBSERVE 



Sensor Data Fusion 



Signal Processing 



Sensing 



Sensor Management 



FIGURE 2.12 The “Omnibus” process model. 11 



2.5.2 Bedworth and O'Brien's Comparison among Models and Omnibus 

Bedworth and O’Brien 11 provide a commendable comparison and attempted synthesis of data fusion 
models. That comparison is summarized in Table 2.5. By comparing the discrimination capabilities of 
the various process models listed — and of the JDL and Dasarathy’s functional models — Bedworth and 
O’Brien suggest a comprehensive “Omnibus” process model as represented in Figure 2.12. 

As noted by Bedworth and O’Brien, an information system’s interaction with its environment need 
not be the single cyclic process depicted in Figure 2.12. Rather, the OODA process is often hierarchical 
and recursive, with analysis/decision loops supporting detection, estimation, evaluation, and response 
decisions at several levels (illustrated in Figure 2.13). 

2.6 Summary 

The goal of the JDL Data Fusion Model is to serve as a functional model for use by diverse elements of 
the data fusion community, to the extent that such a community exists, and to encourage coordination 
and collaboration among diverse communities. A model should clarify the elements of problems and 
solutions to facilitate recognition of commonalties in problems and in solutions. The virtues listed in 
Section 2.3 are significant criteria by which any functional model should be Judged. 12 
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FIGURE 2.13 System interaction via interacting fractal OODA loops. 

Additionally, a functional model must be amenable to implementation in process models. A functional 
model must be compatible with diverse instantiations in architectures and allow foundation in theoretical 
frameworks. Once again, the goal of the functional model is to facilitate understanding and communi- 
cation among acquisition managers, theoreticians, designers, evaluators, and users of data fusion systems 
to permit cost-effect system design, development, and operation. 

The revised JDL model is aimed at providing a useful tool of this sort. If used appropriately as part 
of a coordinated system engineering methodology (as discussed in Chapter 16), the model should facilitate 
research, development, test, and operation of systems employing data fusion. This model should 

• Facilitate communications and coordination among theoreticians, developers, and users by pro- 
viding a common framework to describe problems and solutions. 

• Facilitate research by representing underlying principles of a subject. This should enable research- 
ers to coordinate their attack on a problem and to integrate results from diverse researchers. By 
the same token, the ability to deconstruct a problem into its functional elements can reveal the 
limits of our understanding. 

• Facilitate system acquisition and development by enabling developers to see their engineering 
problems as instances of general classes of problems. Therefore, diverse development activities can 
be coordinated and designs can be reused. Furthermore, such problem abstraction should enable 
the development of more cost-effective engineering methods. 

• Facilitate integration and test by allowing the application of performance models and test data 
obtained with other applications of similar designs. 

• Facilitate system operation by permitting a better sense of performance expectations, derived from 
experiences with entire classes of systems. Therefore, a system user will be able to predict his 
system’s performance with greater confidence. 
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3.1 Introduction 



When a major-league outfielder runs down a long fly ball, the tracking of a moving object looks easy. 
Over a distance of a few hundred feet, the fielder calculates the ball’s trajectory to within an inch or two 
and times its fall to within milliseconds. But what if an outfielder were asked to track 100 fly balls at 
once? Even 100 fielders trying to track 100 balls simultaneously would likely find the task an impossible 
challenge. 

Problems of this kind do not arise in baseball, but they have considerable practical importance in other 
realms. The impetus for the studies described in this chapter was the Strategic Defense Initiative (SDI), 
the plan conceived in the early 1980s for defending the U.S. against a large-scale nuclear attack. According 
to the terms of the original proposal, an SDI system would be required to track tens or even hundreds of 
thousands of objects — including missiles, warheads, decoys, and debris — all moving at speeds of up to 
8 kilometers per second. Another application of multiple-target tracking is air-traffic control, which 
attempts to maintain safe separations among hundreds of aircraft operating near busy airports. In particle 
physics, multiple-target tracking is needed to make sense of the hundreds or thousands of particle tracks 
emanating from the site of a high-energy collision. Molecular dynamics has similar requirements. 

The task of following a large number of targets is surprisingly difficult. If tracking a single baseball, 
warhead, or aircraft requires a certain measurable level of effort, then it might seem that tracking 10 similar 
objects would require at most 10 times as much effort. Actually, for the most obvious methods of solving 
the problem, the difficulty is proportional to the square of the number of objects; thus, 10 objects demand 
100 times the effort, and 10,000 objects increase the difficulty by a factor of 100 million. This combinatorial 
explosion is a first hurdle to solving the multiple-target tracking problem. In fact, exploiting all information 
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to solve the problem optimally requires exponentially scaling effort. This chapter, however, considers 
computational issues that arise for any proposed multiple-target tracking system.* 

Consider how the motion of a single object might be tracked, based on a series of position reports from 
a sensor such as a radar system. To reconstruct the object’s trajectory, plot the successive positions in 
sequence and then draw a line through them (as shown on the left-hand side of Figure 3.1). Extending 
this line yields a prediction of the object’s future position. Now, suppose you are tracking 10 targets 
simultaneously. At regular time intervals 10 new position reports are received, but the reports do not have 
labels indicating the targets to which they correspond. When the 10 new positions are plotted, each report 
could, in principle, be associated with any of the 10 existing trajectories (as illustrated on the right-hand 
side of Figure 3.1). This need to consider every possible combination of reports and tracks makes the 
difficulty of all u-target problem proportional to — or on the order of — n 2 , which is denoted as 0(n 2 ). 

Over the years, many attempts have been made to devise an algorithm for multiple-target tracking 
with better than 0(n 2 ) performance. Some of the proposals offered significant improvements in special 
circumstances or for certain instances of the multiple-target tracking problem, but they retained their 
0(n 2 ) worst-case behavior. However, recent results in the theory of spatial data structures have made 
possible a new class of algorithms for associating reports with tracks — algorithms that scale better than 
quadratically in most realistic environments. In degenerate cases, in which all of the targets are so densely 
clustered that they cannot be individually resolved, there is no way to avoid comparing each report with 
each track. When each report can be feasibly associated only with a constant number of tracks on average, 
subquadratic scaling is achievable. This will become clear later in the chapter. Even with the new methods, 
multiple-target tracking remains a complex task that strains the capacity of the largest and fastest 
supercomputers. However, the new methods have brought important problem instances within reach. 

3.1.1 Keeping Track 

The modern need for tracking algorithms began with the development of radar during World War II. 
By the 1950s, radar was a relatively mature technology. Systems were installed aboard military ships and 
aircraft and at airports. The tracking of radar targets, however, was still performed manually by drawing 
lines through blips on a display screen. The first attempts to automate the tracking process were modeled 
closely on human performance. For the single-target case, the resulting algorithm was straightforward — 
the computer accumulated a series of positions from radar reports and estimated the velocity of the 
target to predict its future position. 

Even single-target tracking presented certain challenges related to the uncertainty inherent in position 
measurements. A first problem involves deciding how to represent this uncertainty. A crude approach is 
to define an error radius surrounding the position estimate. This practice implies that the probability of 
finding the target is uniformly distributed throughout the volume of a three-dimensional sphere. Unfor- 
tunately, this simple approach is far from optimal. The error region associated with many sensors is 
highly nonspherical; radar, for example, tends to provide accurate range information but has relatively 
poorer radial resolution. Furthermore, one would expect the actual position of the target to be closer on 
average to the mean position estimate than to the perimeter of the error volume, which suggests, in turn, 
that the probability density should be greater near the center. 

A second difficulty in handling uncertainty is determining how to interpolate the actual trajectory of 
the target from multiple measurements, each with its own error allowance. For targets known to have 
constant velocity (e.g., they travel in a straight line at constant speed), there are methods for calculating 
tile straight-line path that best fits, by some measure, the series of past positions. A desirable property 
of this approach is that it should always converge on the correct path — as the number of reports increases, 
the difference between the estimated velocity and the actual velocity should approach zero. On the other 
hand, retaining all past reports of a target and recalculating the entire trajectory every time a new report 



The material in this chapter updates and supplements material that first appeared in American Scientist. 1 
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FIGURE 3.1 The information available for plotting a track consists of position reports (shown as dots) from a 
sensor such as a radar system. In tracking a single target (left), one can accumulate a series of reports and then fit a 
line or curve corresponding to those data points to estimate the object’s trajectory. With multiple targets (right), 
there is no obvious way to determine which object has generated each report. Here, five reports appear initially at 
timestep t = 1, then five more are received at t = 2. Neither the human eye nor a computer can easily distinguish 
which of the later dots goes with which of the earlier ones. (In fact, the problem is even more difficult given that the 
reports at t = 2 could be newly detected targets that are not correlated with the previous five reports.) As additional 
reports arrive, coherent tracks begin to emerge. The tracks from which these reports were derived are shown in the 
lower panels at t = 5. Here and in subsequent figures, all targets are assumed to have constant velocity in two 
dimensions. The problem is considerably more difficult for ballistic or maneuvering trajectories in three dimensions. 
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arrives is impractical. Such a method would eventually exceed all constraints on computation time and 
storage space. 

A near-optimal method for addressing a large class of tracking problems was developed in 1960 by 
R.E. Kalman. 2 His approach, referred to as Kalman filtering, involves the recursive fusion of noisy mea- 
surements to produce an accurate estimate of the state of a system of interest. A key feature of the Kalman 
filter is its representation of state estimates in terms of mean vectors and error covariance matrices, where 
a covariance matrix provides an estimate (usually a conservative over-estimate) of the second moment 
of the error distribution associated with the mean estimate. The square root of the estimated covariance 
gives an estimate of the standard deviation. If the sequence of measurement errors are statistically 
independent, the Kalman filter produces a sequence of conservative fused estimates with diminishing 
error covariances. 

Kalman’s work had a dramatic impact on the field of target tracking in particular and data fusion in 
general. By the mid-1960s, Kalman filtering was a standard methodology. It has become as central to 
multiple-target tracking as it has been to single-target tracking; however, it addresses only one aspect of 
the overall problem. 

3.1.2 Nearest Neighbors 

What multiple targets add to the tracking problem is the need to assign each incoming position report to 
a specific target track. The earliest mechanism for classifying reports was the nearest-neighbor rule. The 
idea of the rule is to estimate each object’s position at the time of a new position report, and then assign 
the report to the nearest such estimate (see Figure 3.2). This intuitively plausible approach is especially 
attractive because it decomposes the multiple-target tracking problem into a set of single-target problems. 

The nearest-neighbor rule is straightforward to apply when all tracks and reports are represented as 
points; however, there is no clear means for defining what constitutes “nearest neighbors” among tracks 
and reports with different error covariances. For example, if a sensor has an error variance of 1 cm, then 
the probability that measurements 10 cm apart are from the same object is 0(1(T 20 ), whereas measure- 
ments having a variance of 10 cm could be 20-30 centimeters apart and feasibly correspond to the same 
object. Therefore, the appropriate measure of distance must reflect the relative uncertainties in the mean 
estimates. 

The most widely used measure of the correlation between two mean and covariance pairs {xl, PI}, 
which are assumed to be Gaussian-distributed random variables, is 3,4 



„( x i’ x 2) ; 




(3.1) 



which reflects the probability that Xj is a realization of x 2 or, symmetrically, the probability that x 2 is a 
realization of x,. If this quantity is above a given threshold — called a gate — then the two estimates are 
considered to be feasibly correlated. If the assumption of Gaussianity does not hold exactly — and it 
generally does not — then this measure is heuristically assumed (or hoped) to yield results that are at 
least good enough to be used for ranking purposes (i.e., to say confidently that one measurement is more 
likely than another measurement to be associated with a given track). If this assumption approximately 
holds, then the gate will tend to discriminate high- and low-probability associations. Accordingly, the 
nearest-neighbor rule can be redefined to state that a report should be assigned to the track with which 
it has the highest association ranking. In this way, a multiple-target problem can still be decomposed 
into a set of single-target problems. 

The nearest-neighbor rule has strong intuitive appeal, but doubts and difficulties connected with it 
soon emerged. For example, early implementers of the method discovered problems in creating initial 
tracks for multiple targets. In the case of a single target, two reports can be accumulated to derive a 
velocity estimate, from which a track can be created. For multiple targets, however, there is no obvious 
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FIGURE 3.2 The nearest-neighbor rule is perhaps the simplest approach for determining which tracked object 
produced a given sensor report. When a new position report arrives, all existing tracks are projected forward to the 
time of the new measurement. (In this diagram, earlier target positions are indicated by dots and projected positions 
by circles; the new position report is labeled.) Then, the distance from the report to each projected position is 
calculated, and the report is associated with the nearest track. More generally, the distance calculation is computed 
to reflect the relative uncertainties (covariances) associated with each track and report. In the situation depicted 
above, the report would be assigned to Track 1 , based purely on its Euclidean proximity to the report. If this assignment 
is erroneous, the subsequent tracking process will be adversely affected. 

way to deduce such initial velocities. The first two reports received could represent successive positions 
of a single object or the initial detection of two distinct objects. Every subsequent report could be the 
continuation of a known track or the start of a new one. To make matters worse, almost every sensor 
produces some background rate of spurious reports, which give rise to spurious tracks. Thus, the tracking 
system needs an additional mechanism to recognize and delete tracks that do not receive any subsequent 
confirming reports. 

Another difficulty with the nearest-neighbor rule becomes apparent when reports are misclassified, as 
will inevitably happen from time to time if the tracked objects are close together. A misassignment can 
cause the Kalman -filtering process to converge very slowly, or fail to converge altogether, in which case 
the track cannot be predicted. Moreover, tracks updated with misassigned reports (or not updated at all) 
will tend to correlate poorly with subsequent reports and may, therefore, be mistaken as spurious by the 
track-deletion mechanism. Mistakenly deleted tracks then necessitate subsequent track initiations and a 
possible repetition of the process. 

3.1.3 Track Splitting and Multiple Hypotheses 

A robust solution to the problem of assignment ambiguities is to create multiple hypothesis tracks. Under 
this scheme, the tracking system does not have to commit immediately or irrevocably to a single assign- 
ment of each report. If a report is highly correlated with more than one track, an updated copy of each 
track can be created; subsequent reports can be used to determine which assignment is correct. As more 
reports come in, the track associated with the correct assignment will rapidly converge on the true target 
trajectory, whereas the falsely updated tracks are less likely to be correlated with subsequent reports. 

This basic technique is called track splitting . 3 One of its worrisome consequences is a proliferation in 
the number of tracks upon which a program must keep tabs. The proliferation can be controlled with 
the same track deletion mechanism used in the nearest-neighbor algorithm, which scans through all the 
tracks from time to time and eliminates those that have a low probability of association with recent 
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reports. A more sophisticated approach to track splitting, called multiple-hypothesis tracking, maintains 
a history of track branchings, so that as soon as one branch is confirmed, the alternative branches can 
be pruned away. 

Track splitting in its various forms 5 is a widely applied strategy for handling the ambiguities inherent 
in correlating tracks with reports from multiple targets. It is also used to minimize the effects of spurious 
reports when tracking a single target. Nevertheless, some serious difficulties remain. First, track splitting 
does not completely decompose a multiple -target tracking problem into independent single-target prob- 
lems, the way the nearest-neighbor strategy was intended to function. For example, two hypothesis tracks 
may lock onto the trajectory of a single object. Because both tracks are valid, the standard track-deletion 
mechanism cannot eliminate either of them. The deletion procedure has to be modified to detect 
redundant tracks and, therefore, cannot look at just one track at a time. This coupling between multiple 
tracks is theoretically troubling; however, experience has shown that it can be managed in practice at 
low computational cost. 

A second problem is the difficulty of deciding when a position report and a projected track are 
correlated closely enough to justify creating a new hypothesis track. If the correlation threshold is set too 
high, correct assignments may be missed so often as to prevent convergence of the Kalman filter. If the 
threshold is too low, the number of hypotheses could grow exponentially. The usual practice is to set the 
threshold low enough to ensure convergence, and then add another mechanism to limit the rate of 
hypothesis generation. A simple strategy is to select the n hypothesis candidates with the highest prob- 
abilities of association, where n is the maximum number of hypotheses that computational resource 
constraints will allow. This “greedy” method often yields good performance. 

Even with these enhancements, the tracking algorithm makes such prodigious demands on computing 
resources that large problems remain beyond practical reach. Monitoring the computation to see how 
much time is spent in various subtasks shows that calculating probabilities of association is, by far, the 
biggest expense. The program gets bogged down projecting target tracks to the time of a position report 
and calculating association probabilities. Because this is the critical section of the algorithm, further effort 
has focused on improving performance in this area. 

3.1.4 Gating 

The various calculations involved in estimating a probability of association are numerically intensive and 
inherently time consuming. Thus, one approach to speeding up the tracking procedure is to streamline 
or fine-tune these calculations — to encode them more efficiently without changing their fundamental 
nature. An obvious example is to calculate 

dist 2 (xj , x 2 j = — x 2 + P 2 j (xj-x 2 ) (3.2) 

rather than the full probability of association. This measure is proportional to the logarithm of the 
probability of association and is commonly referred to as the Mahalanobis distance or log-likelihood 
measure. 4 Applying a suitably chosen threshold to this quantity yields a method for obtaining the same 
set of feasible pairs, while avoiding a large number of numerically intensive calculations. 

An approach for further reducing the number of computations is to minimize the number of log-like- 
lihood calculations by performing a simpler preliminary screening of tracks and sensor reports. Only if 
a track report pair passes this computationally inexpensive feasibility check is there a need to complete 
the log-likelihood calculation. Multiple gating tests also can be created for successively weeding out 
infeasible pairs, so that each gate involves more calculations but is applied to considerably fewer pairs 
than the previous gate. 

Several geometric tests could serve as gating criteria. For example, if each track is updated, on average, 
every five seconds, and the targets are known to have a maximum speed of 10 kilometers per second, a 
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track and report more than 50 kilometers apart are not likely to be correlated. A larger distance may be 
required to take into account the uncertainty measures associated with both the tracks and the reports. 

Simple gating strategies can successfully reduce the numerical overhead of the correlation process and 
increase the number of targets that can be tracked in real time. Unfortunately, the benefits of simple gating 
diminish as the number of targets increases. Specifically, implementers of gating algorithms have found 
that increasing the number of targets by a factor of 20 often increases the computational burden by a 
factor of more than 100. Moreover, the largest percentage of computation time is still spent in the 
correlation process, although now the bulk of the demand is for simple distance calculations within the 
gating algorithm. This implies that the quadratic growth in the number of gating tests is more critical 
than the constant numerical overhead associated with the individual tests. In other words, simple gating 
can reduce the average cost of each comparison, but what is really needed is a method to reduce the sheer 
number of comparisons. Some structure must be imposed on the set of tracks that will allow correlated 
track-report pairs to be identified without requiring every report to be compared with every track. 

The gating problem is difficult conceptually because it demands that most pairs of tracks and reports 
be excluded from consideration without ever being examined. At the same time, no track-report pair 
whose probability of association exceeds the correlation threshold can be disregarded. Until the 1980s, 
the consensus in the tracking literature was that these constraints were impossible to satisfy simulta- 
neously. Consequently, the latter constraint was often sacrificed by the use of methods that did allow 
some, but hopefully few, track-report pairs to be missed even though their probabilities of association 
exceeded the threshold. This seemingly reasonable compromise, however, has led to numerous ad hoc 
schemes that either fail to adequately limit the number of comparisons or fail to adequately limit the 
number of missed correlations. Some approaches are susceptible to both problems. 

Most of the ad hoc strategies depend heavily on the distribution of the targets. A common approach 
is to identify clusters of targets that are sufficiently separated that reports from targets in one cluster will 
never have a significant probability of association with tracks from another cluster. 6 This allows the 
correlation process to determine from which cluster a particular report could have originated and then 
compare the report only to the tracks in that cluster. The problem with this approach is that the number 
of properly separated clusters depends on the distribution of the targets and, therefore, cannot be 
controlled by the clustering algorithm (Figure 3.3). If O(m) tracks are partitioned into O(n) clusters, each 
consisting of a constant number of tracks, or into a constant number of clusters of O(n) tracks, the 
method still results in a computational cost that is proportional to the comparison of every report to 
every track. Unfortunately, most real-world tracking problems tend to be close to one of these extremes. 

A gating strategy that avoids some of the distribution problems associated with clustering involves 
partitioning the space in which the targets reside into grid cells. Each track can then be assigned to a cell 
according to its mean projected position. In this way, the tracks that might be associated with a given 
report can be found by examining only those tracks in cells within close proximity to the report’s cell. 
The problem with this approach is that its performance depends heavily on the size of the grid cells, as 
well as on the distribution of the targets (Figure 3.4). If the grid cells are large and the targets are densely 
distributed in a small region, every track will be within a nearby cell. Conversely, if the grid cells are 
small, the algorithm may spend as much time examining cells (most of which may be empty) as would 
be required to simply examine each track. 

3.1.5 Binary Search and kd-Ttees 

The deficiencies of grid methods suggest the need for a more flexible data structure. The main requirement 
imposed on the data structure has already been mentioned — it must allow all proximate track-report 
pairs to be identified without having to compare every report with every track (unless every track is 
within the prescribed proximity to every report). 

A clue to how real-time gating might be accomplished comes from one of the best-known algorithms 
in computer science: binary search. Suppose one is given a sorted list of n numbers and asked to find 
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FIGURE 3.3 Clustering algorithms may produce spatially large clusters with few points and spatially small ones 
with many points. 




FIGURE 3.4 Grids may have a few cells with many points, while the remaining cells contain few or no points. 

out whether or not a specific number, q, is included in the list. The most obvious search method is simply 
to compare q with each number in sequence; in the worst case (when q is the last number or is not 
present at all), the search requires n comparisons. There is a much better way. Because the list is sorted, 
if q is found to be greater than a particular element of the list, one can exclude from further consideration 
not only that element but all those that precede it in the list. This principle is applied optimally in binary 
search. The algorithm is recursive — first compare q to the median value in the list of numbers (by 
definition, the median will be found in the middle of a sorted list). If q is equal to the median value, 
then stop, and report that the search was successful. If q is greater than the median value, then apply the 
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q < Median 



Median < q 




FIGURE 3.5 Each node in a binary search tree stores the median value of the elements in its subtree. Searching 
the tree requires a comparison at each node to determine whether the left or right subtree should be searched. 



same procedure recursively to the sublist greater than the median; otherwise apply it to the sublist less 
than the median (Figure 3.5). Eventually either q will be found — it will be equal to the median of some 
sublist — or a sublist will turn out to be empty, at which point the procedure terminates and reports 
that q is not present in the list. 

The efficiency of this process can be analyzed as follows. At every step, half of the remaining elements 
in the list are eliminated from consideration. Thus, the total number of comparisons is equal to the 
number of halvings, which in turn is 0(log n). For example, if n is 1,000,000, then only 20 comparisons 
are needed to determine if a given number is in the list. 

Binary search can also be used to find all elements of the list that are within a specified range of values 
(min, max). Specifically, it can be applied to find the position in the list of the largest element less than 
min and the position of the smallest element greater than max. The elements between these two positions 
then represent the desired set. Finding the positions associated with min and max requires 0(log n ) 
comparisons. Assuming that some operation will be carried out on each of the m elements of the solution 
set, the overall computation time for satisfying a range query scales as 0(log n + m). 

Extending binary search to multiple dimensions yields a kd- tree. 7 This data structure permits the fast 
retrieval of all 3-D points; for example, in a data set whose x coordinate is in the range (x mln , x max ), whose 
y coordinate is in the range (y mm , y max ) and whose z coordinate is in the range (z mj „, z max ). The kd - tree 
for k = 3 is constructed as follows: The first step is to list the x coordinates of the points and choose the 
median value, then partition the volume by drawing a plane perpendicular to the x-axis through this 
point. The result is to create two subvolumes, one containing all the points whose x coordinates are less 
than the median and the other containing the points whose x coordinates are greater than the median. 
The same procedure is then applied recursively to the two subvolumes, except that now the partitioning 
planes are drawn perpendicular to the y-axis and they pass through points that have median values of 
the y coordinate. The next round uses the z coordinate, and then the procedure returns cyclically to the 
x coordinate. The recursion continues until the subvolumes are empty.* 



* An alternative generalization of binary search to multiple dimensions is to partition the dataset at each stage 
according to its distance from a selected set of points; 8 ' 14 those that are less than the median distance comprise one 
branch of the tree, and those that are greater comprise the other. These data structures are very flexible because they 
offer the freedom to use an appropriate application-specific metric to partition the dataset; however, they are also 
much more computationally intensive because of the number of distance calculations that must be performed. 
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A kd- tree partitions on a different coordinate at each level in the tree. 

FIGURE 3.6 A kd- tree is analogous to an ordinary binary search tree, except that each node stores the median of 
the multidimensional elements in its subtree projected onto one of the coordinate axes. 

Searching the subdivided volume for the presence of a specific point with given x, y, and z coordinates 
is a straightforward extension of standard binary search. As in the one-dimensional case, the search 
proceeds as a series of comparisons with median values, but now attention alternates among the three 
coordinates. First the x coordinates are compared, then the y, then the z, and so on (Figure 3.6). In the 
end, either the chosen point will be found to lie on one of the median planes, or the procedure will come 
to an empty subvolume. 

Searching for all of the points that fall within a specified interval is somewhat more complicated. The 
search proceeds as follows: If x min is less than the median-value x coordinate, the left subvolume must be 
examined. If x max is greater than the median value of x, the right subvolume must be examined. At the 
next level of recursion, the comparison is done using y min and y max , then z min and z max . 

A detailed analysis 15 ’ 17 of the algorithm reveals that for k dimensions (provided that k is greater than 1), 
the number of comparisons performed during the search can be as high as 0(« 1_1/i + m); thus in three 
dimensions the search time is proportional to 0(« 2/3 + m). In the task of matching n reports with n 
tracks, the range query must be repeated n times, so the search time scales as 0(« * n 2/3 + m) or 
0(« 5/3 + m). This scaling is better than quadratic, but not nearly as good as the logarithmic scaling 
observed in the one-dimensional case, which works out for n range queries to be 0(n log u + m). The 
reason for the penalty in searching a multidimensional tree is the possibility at each step that both subtrees 
will have to be searched without necessarily finding an element that satisfies the query. (In one dimension, 
a search of both subtrees implies that the median value satisfies the query.) In practice, however, this 
seldom happens, and the worst-case scaling is rarely seen. Moreover, for query ranges that are small 
relative to the extent of the dataset — as they typically are in gating applications — the observed query 
time for fcd-trees is consistent with 0(log 1+e + n), where £ > 0. 

3.2 Ternary Trees 

The kd-tree is provably optimal for satisfying multidimensional range queries if one is constrained to 
using only linear (i.e., O(n)) storage. 16,17 Unfortunately, it is inadequate for gating purposes because the 
track estimates have spatial extent due to uncertainty in their exact position. In other words, a kd- tree 
would be able to identify all track points that fall within the observation uncertainty bounds. It would 
fail, however, to return any imprecisely localized map item whose uncertainty region intersects the 



©2001 CRC Press LLC 



If the position uncertainties are thresholded, then 
gating requires intersection detection. 



If the largest track radius is added to all the report radii, 
then the tracks can be treated as points. 



FIGURE 3.7 Transferring uncertainty from tracks to reports reduces intersection queries to range queries. 

observation region, but whose mean position does not. Thus, the gating problem requires a data structure 
that stores sized objects and is able to retrieve those objects that intersect a given query region associated 
with an observation. 

One approach for solving this problem is to shift all of the uncertainty associated with the tracks onto 
the reports. 18,19 The nature of this transfer is easy to understand in the simple case of a track and a report 
whose error ellipsoids are spherical and just touching. Reducing the radius of the track error sphere to 
zero, while increasing the radius of the report error sphere by an equal amount, leaves the enlarged report 
sphere just touching the point representing the track, so the track still falls within the gate of the report 
(Figure 3.7). Unfortunately, when this idea is applied to multiple tracks and reports, the query region 
for every report must be enlarged in all directions by an amount large enough to accommodate the largest 
error radius associated with any track. Techniques have been devised to find the minimum enlargement 
necessary to guarantee that every track correlated with a given report will be found; 19 however, many 
tracks with large error covariances can result in such large query regions that an intolerable number of 
uncorrelated tracks will also be found. 




FIGURE 3.8 The intersection of error boxes offers a preliminary indication that a track and a report probably 
correspond to the same object. A more definitive test of correlation requires a computation to determine the extent 
to which the error ellipses (or their higher-dimensional analogs) overlap, but such computations can be too time 
consuming when applied to many thousands of track/report pairs. Comparing bounding boxes is more computa- 
tionally efficient; if they do not intersect, an assumption can be made that the track and report do not correspond 
to the same object. However, intersection does not necessarily imply that they do correspond to the same object. 
False positives must be weeded out in subsequent processing. 
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FIGURE 3.9 Structure of a ternary tree. In a ternary tree, the boxes in the left subtree fall on one side of the 
partitioning (split) plane; the boxes in the right subtree fall to the other side of the plane; and the boxes in the middle 
subtree are strictly cut by the plane. 



A solution that avoids the need to inflate the search volumes is to use a data structure that can satisfy 
ellipsoid intersection queries instead of range queries. One such data structure that has been applied in 
large scale tracking applications is an enhanced form of kd - tree that stores coordinate-aligned boxes. 1,20 
A box is defined as the smallest rectilinear shape, with sides parallel to the coordinate axes, that can 
entirely surround a given error ellipsoid (see Figure 3.8). Because the axes of the ellipse may not corre- 
spond to those of the coordinate system, the box may differ significantly in size and shape from the 
ellipse it encloses. The problem of determining optimal approximating boxes is presented in Reference 21. 

An enhanced form of the kd - tree is needed for searches in which one range of coordinate values is 
compared with another range, rather than the simpler case in which a range is compared with a single 
point. A binary tree will not serve this purpose because it is not possible to say that one interval is entirely 
greater than or less than another when they intersect. What is needed is a ternary tree, with three 
descendants per node (Figure 3.9). At each stage in a search of the tree, the maximum value of one 
interval is compared with the minimum of the other, and vice versa. These comparisons can potentially 
eliminate either the left subtree or the right subtree. In either case, examining the middle subtree — the 
one made up of nodes representing boxes that might intersect the query interval — is necessary. Because 
all of the boxes in a middle subtree intersect the plane defined by the split value, however, the dimen- 
sionality of the subtree can be reduced by one, causing subsequent searches to be more efficient. 

The middle subtree represents obligatory search effort; therefore, one goal is to minimize the number 
of boxes that straddle the split value. However, if most of the nodes fall to the left or right of the split 
value, then few nodes will be eliminated from the search, and query performance will be degraded. Thus, 
a tradeoff must be made between the effects of unbalance and of large middle subtrees. Techniques have 
been developed for adapting ternary trees to exploit distribution features of a given set of boxes, 20 but 
they cannot easily be applied when boxes are inserted and deleted dynamically. The ability to dynamically 
update the search structure can be very important in some applications; this topic is addressed in 
subsequent sections of this chapter. 
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3.3 Priority kd - Trees 



The ternary tree represents a very intuitive approach to extending the kd - tree for the storage of boxes. 
The idea is that, in one dimension, if a balanced tree is constructed from the minimum values of each 
interval, then the only problematic cases are those intervals whose min endpoints are less than a split 
value while their max endpoints are greater. Thus, if these cases can be handled separately (i.e., in separate 
subtrees), then the rest of the tree can be searched the same way as an ordinary binary search tree. This 
approach fails because it is not possible to ensure simultaneously that all subtrees are balanced and that 
the extra subtrees are sufficiently small. As a result, an entirely different strategy is required to bound 
the worst-case performance. 

A technique is known for extending binary search to the problem of finding intersections among one- 
dimensional intervals. 22,23 The priority search tree is constructed by sorting the intervals according to the 
first coordinate as in an ordinary one-dimensional binary search tree. Then down every possible search 
path, the intervals are ordered by the second endpoint. Thus, the intervals encountered by always 
searching the left subarray will all have values for their first endpoint that are less than those of intervals 
with larger indices (i.e., to their right). At the same time, though, the second endpoints in the sequence 
of intervals will be in ascending order. Because any interval whose second endpoint is less than the first 
endpoint of the query interval cannot possibly produce an intersection, an additional stopping criterion 
is added to the ordinary binary search algorithm. 

The priority search tree avoids the problems associated with middle subtrees in a ternary tree by storing 
the min endpoints in an ordinary balanced binary search tree, while storing the max endpoints in priority 
queues stored along each path in the tree. This combination of data structures permits the storage of n 
intervals, such that intersection queries can be satisfied in worst-case 0(log« + m) time, and insertions 
and deletions of intervals can be performed in worst-case O(logn) time. Thus, the priority search tree 
generalizes binary search on points to the case of intervals, without any penalty in terms of errors. 
Unfortunately, the priority search tree is defined purely for intervals in one dimension. 

Whereas the kd - tree can store multidimensional points, but not multidimensional ranges, the priority 
search tree can store one-dimensional ranges, but not multiple dimensions. The question that arises is 
whether the kd - tree can be extended to store boxes efficiently, or whether the priority search tree can be 
extended to accommodate the analogue of intervals in higher dimensions (i.e., boxes). The answer to 
the question is “yes” for both data structures, and the solution is, in fact, a combination of the two. 

A priority kd- tree 24 is defined as follows: given a set S of fc-dimensional box intervals (/o ; , /»',), 1 <i <k, 
a priority kd - tree consists of a kd - tree constructed from the lo endpoints of the intervals with a priority 
set containing up to k items stored at each node (Figure 3.10).* The items stored at each node are the 
minimum set so that the union of the hi endpoints in each coordinate includes a value greater than the 
corresponding hi endpoint of any interval of any item in the subtree. Searching the tree proceeds exactly 
as for all ordinary priority search trees, except that the intervals compared at each level in the tree cycle 
through the k dimensions as in a search of a kd-tiee. 

The priority kd-tree can be used to efficiently satisfy box intersection queries. Just as important, 
however, is the fact that it can be adapted to accommodate the dynamic insertion and deletion of boxes 
in optimal 0(log n ) time by replacing the kd - tree structure with a divided kd - tree structure. 25 The 
difference between the divided kd-tree and an ordinary kd - tree is that the divided variant constructs a 
d-layered tree in which each layer partitions the data structure according to only one of the d coordinates. 
In three dimensions, for example, the first layer would partition on the x coordinate, the next layer on y, 
and the last layer on z. The number of levels per layer/coordinate is determined so as to minimize query 



* Other data structures have been independently called “priority kd- trees” in the literature, but they are designed 
for different purposes. 
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Partition according to coordinate 1 




Partition according to coordinate 2 



Partition according to coordinate 3 



{median bj, max hij, max hi 2 , ... , max hi k } 

FIGURE 3.10 Structure of a priority kd- tree. The priority kd - tree stores multidimensional boxes, instead of vectors. 
A box is defined by an interval (Zo,, hi,) for each coordinate i. The partitioning is applied to the lo coordinates 
analogously to an ordinary kd- tree. The principal difference is that the maximum hi value for each coordinate is 
stored at each node. These hi values function analogously to the priority fields of a priority search tree. In searching 
a priority kd-tree, the query box is compared to each of the stored values at each visited node. If the node partitions 
on coordinate i, then the search proceeds to the left subtree if Zo, is less than the median Zo, associated with the node. 
If hij is greater than the median Zo,, then the right subtree must be searched. The search can be terminated, however, 
if for any j, hj of the query box is greater than the hij stored at the node. 

time complexity. The reason for stratifying the tree into layers for the different coordinates is to allow 
updates within the different layers to be treated just like updates in ordinary one-dimensional binary trees. 

Associating priority fields with the different layers results in a dynamic variant of the priority Zed-tree, 
which is referred to as a Layered Box Tree. Note that the i priority fields, for coordinates l,...,i, need to be 
maintained at level i. This data structure has been proven 26 to be maintainable at a cost of 0(log n) time 
per insertion or deletion and can satisfy box intersection queries 0(n 1 ~ llk log lft n + m), where m is the 
number of boxes in S that intersect a given query box b. A relatively straightforward variant 27 of the data 
structure improves the query complexity to 0(« 1_lft + m), which is optimal. 

The priority kd-tree is optimal among the class of linear-sized data structures, i.e., ones using only 
0(«) storage, but asymptotically better Ojlog 1 n + m) query complexity is possible if 0(n log^ 1 n) storage 
is used. 16,17 However, the extremely complex structure, called a range-segment tree, requires Ojlog 1 ' n) 
update time, and the query performance is OQog* n + m). Unfortunately, this query complexity holds 
in the average case, as well as in the worst case, so it can be expected to provide superior query performance 
in practice only when n is extremely large. For realistic distributions of objects, however, it may never 
provide better query performance practice. Whether or not that is the case, the range-segment tree is 
almost never used in practice because the values of n l ~ m and log fc n are comparable even for n as large 
as 1,000,000, and for datasets of that size the storage for the range-segment tree is multiplied by a factor 
of log 2 ( 1,000,000) = 400. 

3.3.1 Applying the Results 

The method in which multidimensional search structures are applied in a tracking algorithm can be 
summarized as follows: tracks are recorded by storing the information — such as current positions, 
velocities, and accelerations — that a Kalman filter needs to estimate the future position of each candidate 




©2001 CRC Press LLC 




target. When a new batch of position reports arrives, the existing tracks are projected forward to the time 
of the reports. An error ellipsoid is calculated for each track and each report, and a box is constructed 
around each ellipsoid. The boxes representing the track projections are organized into a multidimensional 
tree. Each box representing a report becomes the subject of a complete tree search; the result of the search 
is the set of all track boxes that intersect the given report box. Track-report pairs whose boxes do not 
intersect are excluded from all further consideration. Next the set of track-report pairs whose boxes do 
overlap is examined more closely to see whether the inscribed error ellipsoids also overlap. Whenever 
this calculation indicates a correlation, the track is projected to the time of the new report. Tracks that 
consistently fail to be associated with any reports are eventually deleted; reports that cannot be associated 
with any existing track initiate new tracks. 

The approach for multiple -target tracking described above ignores a plethora of intricate theoretical 
and practical details. Unfortunately, such details must eventually be addressed, and the SDI forced a 
generation of tracking, data fusion, and sensor system researchers to face all of the thorny issues and 
constraints of a real-world problem of immense scale. The goal was to develop a space-based system to 
defend against a full-scale missile attack against the U.S. Two of the most critical problems were the 
design and deployment of sensors to detect the launch of missiles at the earliest moment possible in their 
20-minute mid-course flight, and the design and deployment of weapons systems capable of destroying 
the detected missiles. Although an automatic tracking facility would clearly be an integral component of 
any SDI system, it was not generally considered a “high risk” technology. Tracking, especially of aircraft, 
had been widely studied for more than 30 years, so the tracking of nonmaneuvering ballistic missiles 
seemed to be a relatively simple engineering exercise. The principal constraint imposed by SDI was that 
the tracking be precise enough to predict a missile’s future position to within a few meters, so that it 
could be destroyed by a high-energy laser or a particle-beam weapon. 

The high-precision tracking requirement led to the development of highly detailed models of ballistic 
motion that took into account the effects of atmospheric drag and various gravitational perturbations 
over the earth. By far the most significant source of error in the tracking process, however, resulted from 
the limited resolution of existing sensors. This fact reinforced the widely held belief that the main obstacle 
to effective tracking was the relatively poor quality of sensor reports. The impact of large numbers of 
targets seemed manageable; just build larger, faster computers. Although many in the research community 
thought otherwise, the prevailing attitude among funding agencies was that if 100 objects could be tracked 
in real time, then little difficulty would be involved in building a machine that was 100 times faster — 
or simply having 100 machines run in parallel — to handle 10,000 objects. 

Among the challenges facing the SDI program, multiple-target tracking seemed far simpler than what 
would be required to further improve sensor resolution. This belief led to the awarding of contracts to build 
tracking systems in which the emphasis was placed on high precision at any cost in terms of computational 
efficiency. These systems did prove valuable for determining bounds on how accurately a single cluster of 
three to seven missiles could be tracked in an SDI environment, but ultimately pressures mounted to scale 
up to more realistic numbers. In one case, a tracker that had been tested on five missiles was scaled up to 
track 100, causing the processing time to increase from a couple of hours to almost a month of nonstop 
computation for a simulated 20-minute scenario. The bulk of the computations was later determined to 
have involved the correlation step, where reports were compared against hypothesis tracks. 

In response to a heightened interest in scaling issues, some researchers began to develop and study 
prototype systems based on efficient search structures. One of these systems demonstrated that 65 to 
100 missiles could be tracked in real time on a late- 1980s personal workstation. These results were based 
on the assumption that a good-resolution radar report would be received every five seconds for every 
missile, which is unrealistic in the context of SDI; nevertheless, the demonstration did provide convincing 
evidence that SDI trackers could be adapted to avoid quadratic scaling. A tracker that had been installed 
at the SDI National Testbed in Colorado Springs achieved significant performance improvements after 
a tree-based search structure was installed in its correlation routine; the new algorithm was superior for 
as few as 40 missiles. Stand-alone tests showed that the search component could process 5,000 to 10,000 
range queries in real time on a modest computer workstation of the time. These results suggested that 
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the problem of correlating vast numbers of tracks and reports had been solved. Unfortunately, a new 
difficulty was soon discovered. 

The academic formulation of the problem adopts the simplifying assumption that all position reports 
arrive in batches, with all the reports in a batch corresponding to measurements taken at the same instant 
of all of the targets. A real distributed sensor system would not work this way; reports would arrive in a 
continuing stream and would be distributed over time. In order to determine the probability that a given 
track and report correspond to the same object, the track must be projected to the measurement time 
of the report. If every track has to be projected to the measurement time of every report, the combinatorial 
advantages of the tree-search algorithm is lost. 

A simple way to avoid the projection of each track to the time of every report is to increase the search 
radius in the gating algorithm to account for the maximum distance an object could travel during the 
maximum time difference between any track and report. For example, if the maximum speed of a missile 
is 10 kilometers per second, and the maximum time difference between any report and track is five 
seconds, then 50 kilometers would have to be added to each search radius to ensure that no correlations 
are missed. For boxes used to approximate ellipsoids, this means that each side of the box must be 
increased by 100 kilometers. 

As estimates of what constitutes a realistic SDI scenario became more accurate, members of the tracking 
community learned that successive reports of a particular target often would be separated by as much 
as 30 to 40 seconds. To account for such large time differences would require boxes so immense that the 
number of spurious returns would negate the benefits of efficient search. Demands for a sensor config- 
uration that would report on every target at intervals of 5 to 10 seconds were considered unreasonable 
for a variety of practical reasons. The use of sophisticated correlation algorithms seemed to have finally 
reached its limit. Several heuristic “fixes” were considered, but none solved the problem. 

A detailed scaling analysis of the problem ultimately pointed the way to a solution. Simply accumulate 
sensor reports until the difference between the measurement time of the current report and the earliest 
report exceeds a threshold. A search structure is then constructed from this set of reports, the tracks are 
projected to the mean time of the reports, and the correlation process is performed with the maximum 
time difference being no more than half of the chosen time-difference threshold. The subtle aspect of 
this deceptively simple approach is the selection of the threshold. If it is too small, every track will be 
projected to the measurement time of every report. If it is too large, every report will fall within the 
search volume of every track. A formula has been derived that, with only modest assumptions about the 
distribution of targets, ensures the optimal trade-off between these two extremes. 

Although empirical results confirm that the track file projection approach essentially solves the time 
difference problem in most practical applications, significant improvements are possible. For example, 
the fact that different tracks are updated at different times suggests that projecting all of the tracks at the 
same points in time may be wasteful. An alternative approach might take a track updated with a report 
at time t t and construct a search volume sufficiently large to guarantee that the track gates with any report 
of the target arriving during the subsequent s seconds, where s is a parameter similar to the threshold 
used for triggering track file projections. This is accomplished by determining the region of space the 
target could conceivably traverse based on its kinematic state and error covariance. The box circumscrib- 
ing this search volume can then be maintained in the search structure until time f ; + s, at which point it 
becomes stale and must be replaced with a search volume that is valid from time f ; + s to time f ; + 2s. 
However, if before becoming stale it is updated with a report at time f ; , f ; < f- < f, + s, then it must be 
replaced with a search volume that is valid from time to time t ( + s. 

The benefit of the enhanced approach is that each track is projected only at the times when it is updated 
or when all extended period has passed without an update (which could possibly signal the need to delete 
the track). In order to apply the approach, however, two conditions must be satisfied. First, there must 
be a mechanism for identifying when a track volume has become stale and needs to be recomputed. It 
is, of course, not possible to examine every track upon the receipt of each report because the scaling of 
the algorithm would be undermined. The solution is to maintain a priority queue of the times at which 
the different track volumes will become invalid. A priority queue is a data structure that can be updated 
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efficiently and supports the retrieval of the minimum of n values in 0(log n ) time. At the time a report 
is received, the priority queue is queried to determine which, if any, of the track volumes have become 
stale. New search volumes are constructed for the identified tracks, and the times at which they will 
become invalid are updated in the priority queue. 

The second condition that must be satisfied for the enhanced approach is a capability to incrementally 
update the search structure as tracks are added, updated, recomputed, or deleted. The need for such a 
capability was hinted at in the discussion of dynamic search structures. Because the layered box tree 
supports insertions and deletions in 0(log n ) time, the update of a track’s search volume can be efficiently 
accommodated. The track’s associated box is deleted from the tree, an updated box is computed, and 
then the result is inserted back into the tree. In summary, the cost for processing each report involves 
updates of the search structure and the priority queue, at 0(log n) cost, plus the cost of determining the 
set of tracks with which the report could be feasibly associated. 

3.4 Conclusion 



The correlation of reports with tracks numbering in the thousands can now be performed in real time 
on a personal computer. More research on large-scale correlation is needed, but work has already begun 
on implementing efficient correlation modules that can be incorporated into existing tracking systems. 
Ironically, by hiding the intricate details and complexities of the correlation process, these modules give 
the appearance that multiple-target tracking involves little more than the concurrent processing of several 
single-target problems. Thus, a paradigm with deep historical roots in the field of target tracking is at 
least partially preserved. 

Note that the techniques described in this chapter are applicable only to a very restricted class of 
tracking problems. Other problems, such as the tracking of military forces, demand more sophisticated 
approaches. Not only does the mean position of a military force change, its shape also changes. Moreover, 
reports of its position are really only reports of the positions of its parts, and various parts may be moving 
in different directions at any given instant. Filtering out the local deviations in motion to determine the 
net motion of the whole is beyond the capabilities of a simple Kalman filter. Other difficult tracking 
problems include the tracking of weather phenomena and soil erosion. The history of multiple-target 
tracking suggests that, in addition to new mathematical techniques, new algorithmic techniques will 
certainly be required for any practical solution to these problems. 
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The joint use of imagery and spatial data from different imaging, mapping, or other spatial sensors has the 
potential to provide significant performance improvements over single sensor detection, classification, and 
situation assessment functions. The terms imagery fusion and spatial data fusion have been applied to 
describe a variety of combining operations for a wide range of image enhancement and understanding 
applications. Surveillance, robotic machine vision, and automatic target cueing are among the application 
areas that have explored the potential benefits of multiple sensor imagery. This chapter provides a framework 
for defining and describing the functions of image data fusion in the context of the Joint Directors of 
Laboratories (JDL) data fusion model. The chapter also describes representative methods and applications. 

Sensor fusion and data fusion have become the de facto terms to describe the general abductive or 
deductive combination processes by which diverse sets of related data are joined or merged to produce 



* Adapted from the principles and practice of image and spatial data fusion, in Proceedings of the 8th National 
Data Fusion Conference, Dallas, Texas, March 15-17, 1995, pp. 257-278. 
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a product that is greater than the individual parts. A range of mathematical operators has been applied 
to perform this process for a wide range of applications. Two areas that have received increasing research 
attention over the past decade are the processing of imagery (two-dimensional information) and spatial 
data (three-dimensional representations of real-world surfaces and objects that are imaged). These 
processes combine multiple data views into a composite set that incorporates the best attributes of all 
contributors. The most common product is a spatial (three-dimensional) model, or virtual world, which 
represents the best estimate of the real world as derived from all sensors. 

4.2 Motivations for Combining Image and Spatial Data 

A diverse range of applications has employed image data fusion to improve imaging and automatic 
detection/classification performance over that of single imaging sensors. Table 4.1 summarizes represen- 
tative and recent research and development in six key application areas. 

Satellite and airborne imagery used for military intelligence, photogrammetric, earth resources, and 
environmental assessments can be enhanced by combining registered data from different sensors to refine 
the spatial or spectral resolution of a composite image product. Registered imagery from different passes 
(multitemporal) and different sensors (multispectral and multiresolution) can be combined to produce 
composite imagery with spectral and spatial characteristics equal to or better than that of the individual 
contributors. 

Composite SPOT™ and LANDSAT satellite imagery and 3-D terrain relief composites of military 
regions demonstrate current military applications of such data for mission planning purposes. 1 ' 3 The 
Joint National Intelligence Development Staff (JNIDS) pioneered the development of workstation-based 
systems to combine a variety of image and nonimage sources for intelligence analysts 4 who perform 



TABLE 4. 1 Representative Range of Activities Applying Spatial and Imagery Fusion 





Activities 


Sponsors 


Multiresolution image sharpening 


Satellite/ Airborne Imaging 
Multiple algorithms, tools in commercial packages 


U.S., commercial vendors 


Terrain visualization 


Battlefield visualization, mission planning 


Army, Air Force 


Planetary visualization- 


Planetary mapping missions 


NASA 


exploration 

Geographic information system 


Mapping, Charting and Geodesy 
Terrain feature extraction, rapid map generation 


DARPA, Army, Air Force 


(GIS) generation from multiple 
sources 

Earth environment information 


Earth observing system, data integration system 


NASA 


system 

Battlefield surveillance 


Military Automatic Target Recognition ATR 
Various MMW/LADAR/FLIR 


Army 


Battlefield seekers 


Millimeter wave (MMW)/forward looking IR (FLIR) 


Army, Air Force 


IMINT correlation 


Single Intel IMINT correlation 


DARPA 


IMINT-SIGINT/MTI correlation 


Dynamic database 


DARPA 


3-D multisensor inspection 


Industrial Robotics 
Product line inspection 


Commercial 


Non-destructive inspection 


Image fusion analysis 


Air Force, commercial 


Human body visualization, 


Medical Imaging 

Tomography, magnetic resonance imaging, 3-D fusion 


Various R&D hospitals 


diagnosis 
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• registration — spatial alignment of overlapping images and maps to a common coordinate system; 

• mosaicking — registration of nonoverlapping, adjacent image sections to create a composite of a 
larger area; 

• 3-D mensuration-estimation — calibrated measurement of the spatial dimensions of objects 
within in-image data. 

Similar image functions have been incorporated into a variety of image processing systems, from 
tactical image systems such as the premier Joint Service Image Processing System (JSIPS) to Unix- and 
PC-based commercial image processing systems. Military services and the National Imagery and Mapping 
Agency (NIMA) are performing cross intelligence (i.e., IMINT and other intelligence source) data fusion 
research to link signals and human reports to spatial data. 5 

When the fusion process extends beyond imagery to include other spatial data sets, such as digital 
terrain data, demographic data, and complete geographic information system (CIS) data layers, numerous 
mapping applications may benefit. Military intelligence preparation of the battlefield (IPB) functions 
(e.g., area delimitation and transportation network identification), as well as wide area terrain database 
generation (e.g., precision CIS mapping), are complex mapping problems that require fusion to automate 
processes that are largely manual. One area of ambitious research in this area of spatial data fusion is the 
U.S. Army Topographic Engineering Center’s (TEC) efforts to develop automatic terrain feature gener- 
ation techniques based on a wide range of source data, including imagery, map data, and remotely sensed 
terrain data. 6 On the broadest scale, NIMA’s Global Geospatial Information and Services (GGIS) vision 
includes spatial data fusion as a core functional element. 7 NIMA’s Mapping, Charting and Geodesy Utility 
Software package (MUSE), for example, combines vector and raster data to display base maps with 
overlays of a variety of data to support geographic analysis and mission planning. 

Real-time automatic target cueing/recognition (ATC/ATR) for military applications has turned to 
multiple sensor solutions to expand spectral diversity and target feature dimensionality, seeking to achieve 
high probabilities of correct detection/identification at acceptable false alarm rates. Forward-looking 
infrared (FLIR), imaging millimeter wave (MMW), and light amplification for detection and ranging 
(LADAR) sensors are the most promising suite capable of providing the diversity needed for reliable 
discrimination in battlefield applications. In addition, some applications seek to combine the real-time 
imagery to present an enhanced image to the human operator for driving, control, and warning, as well 
as manual target recognition. 

Industrial robotic applications for fusion include the use of 3-D imaging and tactile sensors to provide 
sufficient image understanding to permit robotic manipulation of objects. These applications emphasize 
automatic object position understanding rather than recognition (e.g., the target recognition) that is, by 
nature, noncooperative). 8 

Transportation applications combine millimeter wave and electro-optical imaging sensors to provide 
collision avoidance warning by sensing vehicles whose relative rates and locations pose a collision threat. 

Medical applications fuse information from a variety of imaging sensors to provide a complete 3-D 
model or enhanced 2-D image of the human body for diagnostic purposes. The United Medical and 
Dental Schools of Guy’s and St. Thomas’ Hospital (London, U.K.) have demonstrated methods for 
registering and combining magnetic resonance (MR), positron emission tomography (PET), and com- 
puter tomography (CT) into composites to aid surgery. 9 

4.3 Defining Image and Spatial Data Fusion 

In this chapter, image and spatial data fusion are distinguished as subsets of the more general data fusion 
problem that is typically aimed at associating and combining 3-D data about sparse point-objects located in 
space. Targets on a battlefield, aircraft in airspace, ships on the ocean surface, or submarines in the 3-D ocean 
volume are common examples of targets represented as point objects in a three-dimensional space model. 

Image data fusion, on the other hand, is involved with associating and combining complete, spatially 
filled sets of data in 2-D (images) or 3-D (terrain or high resolution spatial representations of real objects). 
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FIGURE 4.1 Data fusion application taxonomy. 

Herein lies the distinction: image and spatial data fusion requires data representing every point on a 
surface or in space to be fused, rather than selected points of interest. 

The more general problem is described in detail in introductory texts by Waltz and Llinas 10 and Hall, 11 
while the progress in image and spatial data fusion is reported over a wide range of the technical literature, 
as cited in this chapter. 

The taxonomy in Figure 4.1 distinguishes the data properties and objectives that distinguish four 
categories of fusion applications. 

In all of the image and spatial applications cited above, the common thread of the fusion function is 
its emphasis on the following distinguishing functions: 

• Registration involves spatial and temporal alignment of physical items within imagery or spatial 
data sets and is a prerequisite for further operations. It can occur at the raw image level (i.e., any 
pixel in one image may be referenced with known accuracy to a pixel or pixels in another image, 
or to a coordinate in a map) or at higher levels, relating objects rather than individual pixels. Of 
importance to every approach to combining spatial data is the accuracy with which the data layers 
have been spatially aligned relative to each other or to a common coordinate system (e.g., geo- 
location or geo-coding of earth imagery to an earth projection). Registration can be performed 
by traditional internal image-to-image correlation techniques (when the images are from sensors 
with similar phenomena and are highly correlated) 12 or by external techniques. 13 External methods 
apply in-image control knowledge or as-sensed information that permits accurate modeling and 
estimation of the true location of each pixel in two- or three-dimensional space. 

• The combination function operates on multiple, registered “layers” of data to derive composite 
products using mathematical operators to perform integration; mosaicking; spatial or spectral 
refinement; spatial, spectral or temporal (change) detection; or classification. 

• Reasoning is the process by which intelligent, often iterative search operations are performed 
between the layers of data to assess the meaning of the entire scene at the highest level of abstraction 
and of individual items, events, and data contained in the layers. 

The image and spatial data fusion functions can be placed in the JDL data fusion model context to 
describe the architecture of a system that employs imagery data from multiple sensors and spatial data 
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FIGURE 4.2 Image of a data fusion functional flow can be directly compared to the joint directors of labs ( JDL) 
data fusion subpanel model of data fusion. 

(e.g., maps and solid models) to perform detection, classification, and assessment of the meaning of 
information contained in the scenery of interest. 

Figure 4.2 compares the JDL general model 14 with a specific multisensor ATR image data fusion 
functional flow to show how the more abstract model can be related to a specific imagery fusion 
application. The Level 1 processing steps can be directly related to image counterparts: 

• Alignment — The alignment of data into a common time, space, and spectral reference frame 
involves spatial transformations to warp image data to a common coordinate system (e.g., pro- 
jection to an earth reference model or three-dimensional space). At this point, nonimaging data 
that can be spatially referenced (perhaps not to a point, but often to a region with a specified 
uncertainty) can then be associated with the image data. 

• Association — New data can be correlated with previous data to detect and segment (select) targets 
on the basis of motion (temporal change) or behavior (spatial change). In time-sequenced data 
sets, target objects at time t are associated with target objects at time t - 1 to discriminate newly 
appearing targets, moved targets, and disappearing targets. 

• Tracking — When objects are tracked in dynamic imagery, the dynamics of target motion are 
modeled and used to predict the future location of targets (at time t + 1) for comparison with 
new sensor observations. 

• Identification — The data for segmented targets are combined from multiple sensors (at any one 
of several levels) to provide an assignment of the target to one or more of several target classes. 

Level 2 and 3 processing deals with the aggregate of targets in the scene and other characteristics of 
the scene to derive an assessment of the “meaning” of data in the scene or spatial data set. 

In the following sections, the primary image and spatial data fusion application areas are described 
to demonstrate the basic principles of fusion and the state of the practice in each area. 

4.4 Three Classic Levels of Combination for Multisensor 
Automatic Target Recognition Data Fusion 

Since the late 1970s, the ATR literature has adopted three levels of image data fusion as the basic design 
alternatives offered to the system designer. The terminology was adopted to describe the point in the 
traditional ATR processing chain at which registration and combination of different sensor data occurred. 
These functions can occur at multiple levels, as described later in this chapter. First, a brief overview of 
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FIGURE 4.3 Three basic levels of fusion are provided to the multisensor ATR designer as the most logical alternative 
points in the data chain for combining data. 



TABLE 4.2 Most Common Decision-Level Combination Alternatives 



Decision Type 


Method 


Description 


Hard Decision 


Boolean 

Weighted Sum Score 
M-of-N 


Apply logical AND, OR to combine independent decisions. 

Weight sensors by inverse of covariance and sum to derive score function. 
Confirm decision based on m-out-of-n sensors that agree. 


Soft Decision 


Bayesian 
Dempster- Shafer 
Fuzzy Variable 


Apply Bayes rule to combine sensor independent conditional probabilities. 
Apply Dempster's rule of combination to combine sensor belief functions. 
Combine fuzzy variables using fuzzy logic (AND, OR) to derive combined 
membership function. 



the basic alternatives and representative research and development results is presented. (Broad overviews 
of the developments in ATR in general, with specific comments on data fusion, are available in other 
literature. 15-17 ) 

4.4.1 Pixel-Level Fusion 

At the lowest level , pixel-level fusion uses the registered pixel data from all image sets to perform detection 
and discrimination functions. This level has the potential to achieve the greatest signal detection perfor- 
mance (if registration errors can be contained) at the highest computational expense. At this level, 
detection decisions (pertaining to the presence or absence of a target object) are based on the information 
from all sensors by evaluating the spatial and spectral data from all layers of the registered image data. 
A subset of this level of fusion is segment-level fusion, in which basic detection decisions are made 
independently in each sensor domain, but the segmentation of image regions is performed by evaluation 
of the registered data layers. 

Fusion at the pixel level involves accurate registration of the different sensor images before applying 
a combination operator to each set of registered pixels (which correspond to associated measurements 
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in each sensor domain at the highest spatial resolution of the sensors.) Spatial registration accuracies 
should be subpixel to avoid combination of unrelated data, making this approach the most sensitive to 
registration errors. Because image data may not be sampled at the same spacing, resampling and warping 
of images is generally required to achieve the necessary level of registration prior to combining pixel data. 

In the most direct 2-D image applications of this approach, coregistered pixel data may be classified 
on a pixel-by-pixel basis using approaches that have long been applied to multispectral data classifica- 
tion. 18 Typical ATR applications, however, pose a more complex problem when dissimilar sensors, such 
as FLIR and LADAR, image in different planes. In such cases, the sensor data must be projected into a 
common 2-D or 3-D space for combination. Gonzalez and Williams, for example, have described a 
process for using 3-D LADAR data to infer FLIR pixel locations in 3-D to estimate target pose prior to 
feature extraction. 19 Schwickerath and Beveridge present a thorough analysis of this problem, developing 
an eight-degree of freedom model to estimate both the target pose and relative sensor registration 
(. coregistration ) based on a 2-D and 3-D sensor. 20 

Delanoy et al. demonstrated pixel-level combination of spatial interest images using Boolean and fuzzy 
logic operators. 21 This process applies a spatial feature extractor to develop multiple interest images 
(representing the relative presence of spatial features in each pixel), before combining the interest images 
into a single detection image. Similarly, Hamilton and Kipp describe a probe-based technique that uses 
spatial templates to transform the direct image into probed images that enhance target features for 
comparison with reference templates. 22,23 Using a limited set of television and FLIR imagery, Duane 
compared pixel-level and feature-level fusion to quantify the relative improvement attributable to the 
pixel-level approach with well-registered imagery sets. 24 

4.4.2 Feature-Level Fusion 

At the intermediate level, feature-level fusion combines the features of objects that are detected and 
segmented in the individual sensor domains. This level presumes independent detectability of objects in 
all of the sensor domains. The features for each object are independently extracted in each domain; these 
features crate a common feature space for object classification. 

Such feature-level fusion reduces the demand on registration, allowing each sensor channel to segment 
the target region and extract features without regard to the other sensor’s choice of target boundary. The 
features are merged into a common decision space only after a spatial association is made to determine 
that the features were extracted from objects whose centroids were spatially associated. 

During the early 1990s, the Army evaluated a wide range of feature-level fusion algorithms for 
combining FLIR, MMW, and LADAR data for detecting battlefield targets under the Multi-Sensor Feature 
Level Fusion (MSFLF) Program of the OSD Multi-Sensor Aided Targeting Initiative. Early results dem- 
onstrated marginal gains over single sensor performance and reinforced the importance of careful 
selection of complementary features to specifically reduce single sensor ambiguities. 25 

At the feature level of fusion, researchers have developed model-based (or model-driven) alternatives 
to the traditional statistical methods, which are inherently data driven. Model-based approaches maintain 
target and sensing models that predict all possible views (and target configurations) for comparison with 
extracted features rather than using a more limited set of real signature data for comparison. 26 The 
application of model-based approaches to multiple-sensor ATR offers several alternative implementa- 
tions, two of which are described in Figure 4.4. The Adaptive Model Matching approach performs feature 
extraction (FE) and comparison (match) with predicted features for the estimated target pose. The process 
iteratively searches to find the best model match for the extracted features. 

4.4.2. 1 Discrete Model Matching Approach 

A multisensor model-based matching approach described by Hamilton and Kipp 27 develops a relational 
tree structure (hierarchy) of 2-D silhouette templates. These templates capture the spatial structure of 
the most basic all-aspect target “blob” (at the top or root node), down to individual target hypotheses at 
specific poses and configurations. This predefined search tree is developed on the basis of model data 
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FIGURE 4.4 Two model-based sensor alternatives demonstrate the use of a prestored hierarchy of model-based 
templates or an online, iterative model that predicts features based upon estimated target pose. 

for each sensor, and the ATR process compares segmented data to the tree, computing a composite score 
at each node to determine the path to the most likely hypotheses. At each node, the evidence is accu- 
mulated by applying an operator (e.g., weighted sum, Bayesian combination, etc.) to combine the score 
for each sensor domain. 

4.4. 2. 2 Adaptive Model Matching Approach 

Rather than using prestored templates, this approach implements the sensor/target modeling capability 
within the ATR algorithm to dynamically predict features for direct comparison. Figure 4.4 illustrates a 
two-sensor extension of the one-sensor, model-based ATR paradigm (e.g., ARAGTAP 28 or MSTAR 29 
approaches) in which independent sensor features are predicted and compared iteratively, and evidence 
from the sensors is accumulated to derive a composite score for each target hypothesis. 

Larson et al. describe a model-based IR/LADAR fusion algorithm that performs extensive pixel-level 
registration and feature extraction before performing the model-based classification at the extracted feature 
level. 30 Similarly, Corbett et al. describe a model-based feature-level classifier that uses IR and MMW 
models to predict features for military vehicles. 31 Both of these follow the adaptive generation approach. 

4.4.3 Decision-Level Fusion 

Fusion at the decision level (also called post-decision or post-detection fusion) combines the decisions of 
independent sensor detection/classification paths by Boolean (AND, OR) operators or by a heuristic 
score (e.g., M-of-N, maximum vote, or weighted sum). Two methods of making classification decisions 
exist: hard decisions (single, optimum choice) and soft decisions, in which decision uncertainty in each 
sensor chain is maintained and combined with a composite measure of uncertainty. 

The relative performance of alternative combination rules and independent sensor thresholds can be 
optimally selected using distribution data for the features used by each sensor. 32 In decision-level fusion, 
each path must independently detect the presence of a candidate target and perform a classification on 
the candidate. These detections and/or classifications (the sensor decisions) are combined into a fused 
decision. This approach inherently assumes that the signals and signatures in each independent sensor 
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chain are sufficient to perform independent detection before the sensor decisions are combined. This 
approach is much less sensitive to spatial misregistration than all others and permits accurate association 
of detected targets to occur with registration errors over an order of magnitude larger than for pixel- 
level fusion. Lee and Vleet have shown procedures for estimating the registration error between sensors 
to minimize the mean square registration error and optimize the association of objects in dissimilar 
images for decision-level fusion. 33 

Decision-level fusion of MMW and IR sensors has long been considered a prime candidate for 
achieving the level of detection performance required for autonomous precision-guided munitions. 34 
Results of an independent two-sensor (MMW and IR) analysis on military targets demonstrated the 
relative improvement of two-sensor decision-level fusion over either independent sensor. 35-37 A summary 
of ATR comparison methods was compiled by Diehl, Shields, and Hauter. 38 These studies demonstrated 
the critical sensitivity of performance gains to the relative performance of each contributing sensor and 
the independence of the sensed phenomena. 

4.4.4 Multiple-Level Fusion 

In addition to the three classic levels of fusion, other alternatives or combinations have been advanced. 
At a level even higher than the decision level, some researchers have defined scene-level methods in which 
target detections from a low-resolution sensor are used to cue a search-and-confirm action by a higher 
resolution sensor. Menon and Kolodzy described such a system, which uses FLIR detections to cue the 
analysis of high spatial resolution laser radar data using a nearest neighbor neural network classifier. 39 
Maren describes a scene structure method that combines information from hierarchical structures devel- 
oped independently by each sensor by decomposing the scene into element representations. 40 Others 
have developed hybrid, multilevel techniques that partition the detection problem to a high level (e.g., 
decision level) and the classification to a lower level. Aboutalib et al. described a hybrid algorithm that 
performs decision-level combination for detection (with detection threshold feedback) and feature-level 
classification for air target identification in IR and TV imagery. 41 

Other researchers have proposed multi-level ATR architectures, which perform fusion at all levels, 
carrying out an appropriate degree of combination at each level based on the ability of the combined 
information to contribute to an overall fusion objective. Chu and Aggarwal describe such a system that 
integrates pixel-level to scene-level algorithms. 42 Eggleston has long promoted such a knowledge-based 
ATR approach that combines data at three levels, using many partially redundant combination stages to 
reduce the errors of any single unreliable rule. 43,44 The three levels in this approach are 

• Low level — Pixel-level combinations are performed when image enhancement can aid higher- 
level combinations. The higher levels adaptively control this line grain combination. 

• Intermediate symbolic level — Symbolic representations ( tokens ) of attributes or features for 
segmented regions ( image events) are combined using a symbolic level of description. 

• High level — The scene or context level of information is evaluated to determine the meaning of 
the overall scene, by considering all intermediate-level representations to derive a situation assess- 
ment. For example, this level may determine that a scene contains a brigade-sized military unit 
forming for attack. The derived situation can be used to adapt lower levels of processing to refine 
the high-level hypotheses. 

Bowman and DeYoung described an architecture that uses neural networks at all levels of the conven- 
tional ATR processing chain to achieve pixel-level performances of up to 0.99 probability of correct 
identification for battlefield targets using pixel-level neural network fusion of UV, visible, and MMW 
imagery. 45 

Pixel, feature, and decision-level fusion designs have focused on combining imagery for the purposes 
of detecting and classifying specific targets. The emphasis is on limiting processing by combining only the 
most likely regions of target data content and combining at the minimum necessary level to achieve the 
desired detection/classification performance. This differs significantly from the next category of image 
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fusion designs, in which all data must be combined to form a new spatial data product that contains the 
best composite properties of all contributing sources of information. 



4.5 Image Data Fusion for Enhancement of Imagery Data 

Both still and moving image data can be combined from multiple sources to enhance desired features, 
combine multiresolution or differing sensor look geometries, mosaic multiple views, and reduce uncor- 
related noise. 

4.5.1 Multiresolution Imagery 

One area of enhancement has been in the application of band sharpening or multiresolution image fusion 
algorithms to combine differing resolution satellite imagery. The result is a composite product that 
enhances the spatial boundaries in lower resolution multispectral data using higher resolution panchro- 
matic or Synthetic Aperture Radar (SAR) data. 

Veridian-ERIM International has applied its Sparkle algorithm to the band sharpening problem, 
demonstrating the enhancement of lower-resolution SPOT™ multispectral imagery (20-meter ground 
sample distance or GSD) with higher resolution airborne SAR (3-meter GSD) and panchromatic pho- 
tography (1 -meter) to sharpen the multispectral data. Radar backscatter features are overlayed on the 
composite to reveal important characteristics of the ground features and materials. The composite image 
preserves the spatial resolution of the pancromatic data, the spectral content of the multispectral layers, 
and the radar reflectivity of the SAR. 

Vrabel has reported the relative performance of a variety of band sharpening algorithms, concluding 
that Veridian ERIM International’s Sparkle algorithm and a color normalization (CN) technique provided 
the greatest GSD enhancement and overall utility. 46 Additional comparisons and applications of band 
sharpening techniques have been published in the literature. 47-50 

Imagery can also be mosaicked by combining overlapping images into a common block, using classical 
photogrammetric techniques (bundle adjustment) that use absolute ground control points and tie points 
(common points in overlapped regions) to derive mapping polynomials. The data may then be forward 
resampled from the input images to the output projection or backward resampled by projecting the location 
of each output pixel onto each source image to extract pixels for resampling. 51 The latter approach permits 
spatial deconvolution functions to be applied in the resampling process. Radiometric feathering of the data 
in transition regions may also be necessary to provide a gradual transition after overall balancing of the 
radiometric dynamic range of the mosaicked image is performed. 52 Such mosaicking fusion processes have 
also been applied to three-dimensional data to create composite digital elevation models (DEMs) of terrain. 53 

4.5.2 Dynamic Imagery 

In some applications, the goal is to combine different types of real-time video imagery to provide the 
clearest possible composite video image for a human operator. The David Sarnoff Research Center has 
applied wavelet encoding methods to selectively combine IR and visible video data into a composite 
video image that preserves the most desired characteristics (e.g., edges, lines, and boundaries) from each 
data set. 54 The Center later extended the technique to combine multitemporal and moving images into 
composite mosaic scenes that preserve the “best” data to create a current scene at the best possible 
resolution at any point in the scene. 55,56 

4.5.3 Three-Dimensional Imagery 

Three-dimensional perspectives of the earth’s surface are a special class of image data fusion products 
that have been developed by draping orthorectified images of the earth’s surface over digital terrain 
models. The 3-D model can be viewed from arbitrary static perspectives, or a dynamic fly-through, which 
provides a visualization of the area for mission planners, pilots, or land planners. 
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TABLE 4.3 Basic Image Data Fusion Functions Provided in Several Commercial Image Processing Software Packages 





Function 


Description 


Registration 


Sensor-platform modeling 


Model sensor-imaging geometry; derive correction 
transforms (e.g., polynomials) from collection parameters 
(e.g., ephemeris, pointing, and earth model) 




Ground Control Point (GCP) calibration 


Locate known GCPs and derive correction transforms 




Warp to polynomial 
Orthorectify to digital terrain model 


Spatially transform (warp) imagery to register pixels to 
regular grid or to a digital terrain model 




Resample imagery 


Resample warped imagery to create fixed pixel-sized image 


Combination 


Mosaic imagery 


Register adjacent and overlapped imagery; resample to 
common pixel grid 




Edge feathering 


Combine overlapping imagery data to create smooth 
(feathered) magnitude transitions between two image 
components 




Band sharpening 


Enhance spatial boundaries (high-frequency content) in 
lower resolution band data using higher resolution registered 
imagery data in a different band 



Off-nadir regions of aerial or spaceborne imagery include a horizontal displacement error that is a 
function of the elevation of the terrain. A digital elevation model (DEM) is used to correct for these 
displacements in order to accurately overlay each image pixel on the corresponding post (i.e., terrain 
grid coordinate). Photogrammetric orthorectification functions 57 include the following steps to combine 
the data: 

• DEM preparation — the digital elevation model is transformed to the desired map projection for 
the final composite product. 

• Transform derivation — platform, sensor, and the DEM are used to derive mapping polynomials 
that will remove the horizontal displacements caused by to terrain relief, placing each input image 
pixel at the proper location on the DEM grid. 

• Resampling — The input imagery is resampled into the desired output map grid. 

• Output file creation — The resampled image data (x, y, and pixel values) and DEM (x, y, and z) 
are merged into a file with other geo-referenced data, if available. 

• Output product creation — Two-dimensional image maps may be created with map grid lines, 
or three-dimensional visualization perspectives can be created for viewing the terrain data from 
arbitrary viewing angles. 

The basic functions necessary to perform registration and combination are provided in an increasing 
number of commercial image processing software packages (see Table 4.3), permitting users to fuse static 
image data for a variety of applications. 

4.6 Spatial Data Fusion Applications 

Robotic and transportation applications include a wide range of applications similar to military appli- 
cations. Robotics applications include relatively short-range, high-resolution imaging of cooperative 
target objects (e.g., an assembly component to be picked up and accurately placed) with the primary 
objectives of position determination and inspection. Transportation applications include longer-range 
sensing of vehicles for highway control and multiple sensor situation awareness within a vehicle to provide 
semi-autonomous navigation, collision avoidance, and control. 

The results of research in these areas are chronicled in a variety sources, beginning with the 1987 
Workshop on Spatial Reasoning and MultiSensor Fusion, 58 and many subsequent SPIE conferences. 59 ' 63 
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4.6.1 Spatial Data Fusion: Combining Image and Non-Image Data 
to Create Spatial Information Systems 

One of the most sophisticated image fusion applications combines diverse sets of imagery (2-D), spatially 
referenced nonimage data sets, and 3-D spatial data sets into a composite spatial data information system. 
The most active area of research and development in this category of fusion problems is the development 
of geographic information systems (GIS) by combining earth imagery, maps, demographic and infra- 
structure or facilities mapping (geospatial) data into a common spatially referenced database. 

Applications for such capabilities exist in three areas. In civil government, the need for land and 
resource management has prompted intense interest in establishing GISs at all levels of government. The 
U.S. Federal Geographic Data Committee is tasked with the development of a National Spatial Data 
Infrastructure (NSDI), which establishes standards for organizing the vast amount of geospatial data 
currently available at the national level and coordinating the integration of future data. 64 

Commercial applications for geospatial data include land management, resources exploration, civil engi- 
neering, transportation network management, and automated mapping/facilities management for utilities. 

The military application of such spatial databases is the intelligence preparation of the battlefield 
(IPB), 65 which consists of developing a spatial database containing all terrain, transportation, ground- 
cover, manmade structures, and other features available for use in real-time situation assessment for 
command and control. The Defense Advanced Research Projects Agency (DARPA) Terrain Feature 
Generator is one example of a major spatial database and fusion function defined to automate the 
functions of IPB and geospatial database creation from diverse sensor sources and maps. 66 

To realize efficient, affordable systems capable of accommodating the volume of spatial data required 
for large regions and performing reasoning that produces accurate and insightful information depends 
on two critical technology areas: 

• Spatial Data Structure — Efficient, linked data structures are required to handle the wide variety 
of vector, raster, and nonspatial data sources. Hundreds of point, lineal, and areal features must 
be accommodated. Data volumes are measured in terabytes and short access times are demanded 
for even broad searches. 

• Spatial Reasoning — The ability to reason in the context of dynamically changing spatial data is 
required to assess the “meaning” of the data. The reasoning process must perform the following 
kinds of operations to make assessments about the data: 

• Spatial measurements (e.g., geometric, topological, proximity, and statistics) 

• Spatial modeling 

• Spatial combination and inference operations, in uncertainty 

• Spatial aggregation of related entities 

• Multivariate spatial queries 

Antony surveyed the alternatives for representing spatial and spatially referenced semantic knowledge 67 
and published the first comprehensive data fusion text 68 that specifically focused on spatial reasoning for 
combining spatial data. 

4.6.2 Mapping, Charting and Geodesy (MC&G) Applications 

The use of remotely sensed image data to create image maps and generate GIS base maps has long been 
recognized as a means of automating map generation and updating to achieve currency as well as 
accuracy. 69-71 The following features characterize integrated geospatial systems: 

• Currency — Remote sensing inputs enable continuous update with change detection and moni- 
toring of the information in the database. 

• Integration — Spatial data in a variety of formats (e.g., raster and vector data) is integrated with 
meta data and other spatially referenced data, such as text, numerical, tabular, and hypertext 
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FIGURE 4.5 The spatial data fusion process flow includes the generation of a spatial database and the assessment 
of spatial information in the database by multiple users. 



formats. Multiresolution and multiscale spatial data coexist, are linked, and share a common 
reference (i.e., map projection). 

• Access — The database permits spatial query access for multiple user disciplines. All data is 
traceable and the data accuracy, uncertainty, and entry time are annotated. 

• Display — Spatial visualization and query tools provide maximum human insight into the data 
content using display overlays and 3-D capability. 

Ambitious examples of such geospatial systems include the DARPA Terrain Feature Generator, the 
European ESPRIT II MultiSource Image Processing System (MuSIP), 72,73 and NASA’s Earth Observing 
Systems Data and Information System (EOSDIS). 74 

Figure 4.5 illustrates the most basic functional flow of such a system, partitioning the data integration 
(i.e., database generation) function from the scene assessment function. The integration functions spa- 
tially registers and links all data to a common spatial reference and also combines some data sets by 
mosaicking, creating composite layers, and extracting features to create feature layers. During the inte- 
gration step, higher-level spatial reasoning is required to resolve conflicting data and to create derivative 
layers from extracted features. The output of this step is a registered, refined, and traceable spatial 
database. 

The next step is scene assessment, which can be performed for a variety of application functions (e.g., 
further feature extraction, target detection, quantitative assessment, or creation of vector layers) by a 
variety of user disciplines. This stage extracts information in the context of the scene, and is generally 
query driven. 

Table 4.4 summarizes the major kinds of registration, combination, and reasoning functions that are 
performed, illustrating the increasing levels of complexity in each level of spatial processing. Faust 
described the general principles for building such a geospatial database, the hierarchy of functions, and 
the concept for a blackboard architecture expert system to implement the functions described above. 75 

4. 6. 2.1 A Representative Example 

The spatial reasoning process can be illustrated by a hypothetical military example that follows the process 
an image or intelligence analyst might follow in search of critical mobile targets (CMTs). Consider the 
layers of a spatial database illustrated in Figure 4.6, in which recent unmanned air vehicle (UAV) SAR 
data (the top data layer) has been registered to all other layers, and the following process is performed 
(process steps correspond to path numbers on the figure): 
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TABLE 4.4 Spatial Data Fusion Functions 







Increasing Complexity and Processing 






Registration 


Combination 


Reasoning 


Data Fusion 
Functions 


Image registration 
Image-to-terrain registration 
Orthorectification 
Image mosaicking, including 
radiometric balancing and 
feathering 

Multitemporal change detection 


Multiresolution image sharpening 
Multispectral classification of 
registered imagery 
Image-to-image cueing 
Spatial detection via multiple layers 
of image data 

Feature extraction using multilayer 
data 


Image-to-image cross layer 
searches 

Feature finding: extraction by 
roaming across layers to increase 
detection, recognition, and 
confidence 
Context evaluation 
Image-to-nonimage cueing (e.g., 
IMINT to SIGINT) 

Area delimitation 


Examples 


Coherent radar imagery change 
detection 

SPOT™ imagery mosaicking 
LANDSAT magnitude change 
detection 


Multispectral image sharpening 
using panchromatic image 
3-D scene creation from multiple 
spatial sources 


Area delimitation to search for 
critical target 

Automated map feature extraction 
Automated map feature updating 



Note: Spatial data fusion functions include a wide variety of registration, combination, and reasoning processes and algorithms. 
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FIGURE 4.6 Target search example uses multiple layers of spatial data and applies iterative spatial reasoning to 
evaluate alternative hypotheses while accumulating evidence for each candidate target. 

1. A target cueing algorithm searches the SAR imagery for candidate CMT targets, identifying 
potential targets in areas within the allowable area of a predefined delimitation mask (Data Layer 2).* 

2 . Location of a candidate target is used to determine the distance to transportation networks (which 
are located in the map Data Layer 3) and to hypothesize feasible paths from the network to the 
hide site. 

3. The terrain model (Data Layer 8) is inspected along all paths to determine the feasibility that the 
CMT could traverse the path. Infeasible path hypotheses are pruned. 

4. Remaining feasible paths (on the basis of slope) are then inspected using the multispectral data 
(Data Layers 4, 5, 6, and 7). A multispectral classification algorithm is scanned over the feasible 



’''This mask is a derived layer produced, by a spatial reasoning process in the scene generation stage, to delimit the 
entire search region to only those allowable regions in which a target may reside. 
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paths to assess ground load-bearing strength, vegetation cover, and other factors. Evidence is 
accumulated for slope and these factors (for each feasible path) to determine a composite path 
likelihood. Evidence is combined into a likelihood value and unlikely paths are pruned. 

5. Remaining paths are inspected in the recent SAR data (Data Layer 1) for other significant evidence 
(e.g., support vehicles along the path, recent clear cut) that can support the hypothesis. Supportive 
evidence is accumulated to increase likelihood values. 

6. Composite evidence (target likelihood plus likelihood of feasible paths to candidate target hide 
location) is then used to make a final target detection decision. 

In the example presented in Figure 4.6, the reasoning process followed a spatial search to accumulate 
(or discount) evidence about a candidate target. In addition to target detection, similar processes can be 
used to 

• Insert data in the database (e.g., resolve conflicts between input sources), 

• Refine accuracy using data from multiple sources, etc., 

• Monitor subtle changes between existing data and new measurements, and 

• Evaluate hypotheses about future actions (e.g., trafficability of paths, likelihood of flooding given 
rainfall conditions, and economy of construction alternatives). 

4.7 Summary 

The fusion of image and spatial data is an important process that promises to achieve new levels of 
performance and integration in a variety of application areas. By combining registered data from multiple 
sensors or views, and performing intelligent reasoning on the integrated data sets, fusion systems are 
beginning to significantly improve the performance of current generation automatic target recognition, 
single-sensor imaging, and geospatial data systems. 
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5.1 Introduction 



Sensor fusion refers to the use of multiple sensor readings to infer a single piece of information. Inputs 
may be received from a single sensor over a period of time. They may be received from multiple sensors 
of the same or different types. Inputs may be raw data, extracted features, or higher-level decisions. This 
process provides increased robustness and accuracy in machine perception. This is conceptually similar 
to the use of repeated experiments to establish parameter values using statistics. 1 Several reference books 
have been published on sensor fusion. 2-4 

One decomposition of the sensor fusion process is shown in Figure 5.1. Sensor readings are gathered, 
preprocessed, compared, and combined, and a final result is derived. An essential preprocessing step for 
comparing readings from independent physical sensors is transforming all input data into a common 
coordinate system. This is referred to as data registration. In this chapter, we describe data registration, 
provide a review of existing methods, and discuss some recent results. 

Data registration transformation is often assumed to be known a priori, partially because the problem 
is not trivial. Traditional methods are based on methods developed by cartographers. These methods 
have a number of drawbacks and often make invalid assumptions concerning the input data. 

Although data input includes raw sensor readings, features extracted from sensor data, and higher- 
level information, registration is a preprocessing stage and, therefore, is usually applied only to either 
raw data or extracted features. Sensor readings can have one to n dimensions. The number of dimensions 
will not necessarily be an integer. Most techniques deal with data of two or three dimensions; however, 
same approaches can be trivially applied to one-dimensional readings. Depending on the sensing modal- 
ities used, occlusion may be a problem with data in more than two dimensions, causing data in the 
environment to be obscured by the relative position of objects in the environment. The specific case 
studies presented in this chapter use image data in two dimensions and range data in 2 l h dimensions. 

This chapter is organized as follows. Section 5.2 gives a formal definition of image registration. Section 
5.3 provides a brief survey of existing methods. Section 5.4 discusses meta-heuristic techniques that have 
been used for image registration. This includes objective functions for sensor readings with various types 
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FIGURE 5.1 Decomposition of sensor fusion process. 

of noise. Section 5.5 discusses a multiresolution implementation of image registration. Section 5.6 
provides a brief summary discussion. 

5.2 Registration Problem 

Competitive multiple sensor networks consist of a large number of physical sensors providing readings 
that are at least partially redundant. The first step in fusing multiple sensor readings is registering them 
to a common frame of reference. 5 “Registration” refers to finding the correct mapping of one image onto 
another. When an inaccurate estimate of the registration is known, finding the exact registration is referred 
to as refined registration. Another survey of image registration can be found in Brown. 6 

As shown in Figure 5.2, the general image registration problem is, given two N-dimensional sensor 
readings, find the function F which best maps the reading from sensor two, S 2 (x 1 ,...,x„) onto the reading 
from sensor one, S^Xj,.. Ideally, F(S 2 (x 1; . = S/Xj,...,*,,). Because all sensor readings contain 
some amount of measurement error or noise, the ideal case rarely occurs. 

Many processes require that data from one image, called the observed image, be compared with or 
mapped to another image, called the reference image. As a result, a wide range of critical applications 
depends on image registration. 

Perhaps the largest amount of image registration research is focused on medical imaging. One appli- 
cation is sensor fusion to combine outputs from several medical imaging technologies, such as PET and 
MRI, to form a more complete image of internal organs. 7 Registered images are then used for medical 
diagnosis of illness 8 and automated control of radiation therapy. 9 Similar applications of registered and 
fused images are common 11 in military applications (e.g. terrain “footprints”), 10 remote sensing applica- 
tions, and robotics. A novel application is registering portions of images to estimate motion. Descriptions 
of motion can then be used to construct intermediate images in television transmissions. Jain and Jain 
describe the applications of this to bandwidth reduction in video communications. 12 These are some of 
the more recent applications that rely on accurate image registration. Methods of image registration have 
been studied since the beginning of the field of cartography. 

Given two images: 



Observed S 2 




Reference S 2 




Find the function that best maps the 
observed image to the reference image: 



/I 


F(S 2 ) - s 1 


1 




w 

F = rotate observed image 90 degrees 




V V 


and translate image 5 inches in the 


k / 



positive y direction. 



FIGURE 5.2 Registration is finding the mapping function F(S 2 ). 
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TABLE 5.1 Image Registration Methods 



Algorithm 


Image 

Type 


Matching 

Method 


Interpolation 

Function 


Transforms 

Supported 


Comments 


Andrus 


Boundary maps 


Correlation 


None 


Gruence 


Noise intolerant, small rotations 


Barnea 


No restriction 


Improved 

correlation 


None 


Translation 


No rotation, scaling noise, 
rubber sheet 


Barrow 


No restriction 


Hill climbing 


Parametric 

chamfer 


Gruence 


Noise tolerant, small 
displacement 


Brooks 

Iyengar 


No restriction 


Elitist gen. Alg. 


None 


Gruence 


Noise tolerant, tolerates 
periodicity 


Cox 


Line segments 


Hill climbing 


None 


Gruence 


Matches using small number of 
features 


Davis 


Specific shapes 


Relaxation 


None 


Affine 


Matches shapes 


Goshtasby 

1986 


Control points 


Various 


Piecewise 

linear 


Rubber sheet 


Fits images using mapped points 


Goshtasby 

1987 


Control points 


Various 


Piecewise 

cubic 


Rubber sheet 


Fits images using mapped points 


Goshtasby 

1988 


Control points 


Various 


Lease squares 


Rubber sheet 


Fits images using mapped points 


Jain 


Sub -images 


Hill climbing 


None 


Translation 


Small translations, no rotation, 
no noise 


Mandara 


Control points 


Classic G.A.S.A. 


Bi-linear 


Rubber sheet 


Fits 4 fixed points using error 
fitness 


Mitiche 


Control points 


Least squares 


None 


Affine 


Uses control points 


Oghabian 


Control points 


Sequential search 


Least squares 


Rubber sheet 


Assumes small displacement 


Pinz 


Control points 


Tree search 


None 


Affine 


Difficulty with local minima 


Stockman 


Control points 


Cluster 


None 


Affine 


Assumes landmarks, periodicity 
problem 


Wong 


Intensity 

differences 


Exhaustive search 


None 


Affine 


Uses edges, intense computation 


5.3 Review of Existing Research 







This section discusses the current state of research concerning image registration. Image registration is 
a basic problem in image processing, and a large number of methods have been proposed. 

Table 5.1 summarizes the features of representative image registration methods discussed in this 
section. The discussion is followed by a detailed discussion of the established methodologies, and algo- 
rithms currently in use. Each is explored in more detail in the remainder of the section. 

The traditional method of registering two images is an extension of methods used in cartography. A 
number of control points are found in both images. The control points are matched, and this match is 
used to deduce equations that interpolate all points in the new image to corresponding points in the 
reference image. 13,14 

Several algorithms exist for each phase of this process. Control points must be unique and easily 
identified in both images. Control points have been explicitly placed in the image by the experimenter 9 
and edges have been defined by intensity changes, 15 specific points peculiar to a given image, 16 line 
intersections, center of gravity of closed regions, or points of high curvature. 13 The type of control point 
that should be used primarily depends on the application and contents of the image. For example, in 
medical image processing, the contents of the image and approximate poses are generally known a priori. 

Similarly, many methods have been proposed for matching control points in the observed image to 
the control points in the reference image. The obvious method is to correlate a template of the observed 
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image. 17,18 Another widely used approach is to calculate the transformation matrix, which describes the 
mapping with the least square error. 11,16,19 Other standard computational methods, such as relaxation 
and hill-climbing, have also been used. 12,20,21 Pinz et al. use a hill-climbing algorithm to match images 
and note the difficulty posed by local minima in the search space; to overcome this, they run a number 
of attempts in parallel with different initial conditions. 22 

Some interesting methods have been implemented that consider all possible transformations. Stock- 
man et al. construct vectors between all pairs of control points in an image. 10 For each vector in each 
image, an affine transformation matrix is computed which converts the vector from the observed image 
to one of the vectors from the reference image. These transformations are then plotted, and the region 
containing the largest number of correspondences is assumed to contain the correct transformation. 10 
This method is computationally expensive because it considers the power set of control points in each 
image. Wong and Hall match scenes by extracting edges or intensity differences and constructing a tree 
of all possible matches that fall below a given error threshold. 15 They reduce the amount of computation 
needed by stopping all computation concerning a potential matching once the error threshold is exceeded; 
however, this method remains computationally intensive. Dai and Khorram extract affine transform 
invariant features based on the central moments of regions found in remote sensing images. 23 Regions 
are defined by zero-crossing points. Similarly, Yang and Cohen describe a moments-based method for 
registering images using affine transformations given sets of control points. 24 

Registration of multisensor data to a three-dimensional scene, given a knowledge of the contents of 
the scene, is discussed by Chellappa. 25 The use of an extended Kalman filter (EKF) to register moving 
sensors in a sensor fusion problem is discussed by Zhou. 26 Mandara and Fitzpatrick have implemented 
a very interesting approach 8 using simulated annealing and genetic algorithm heuristics to find good 
matches between two images. They find a rubber sheet transformation, which fits two images by using 
linear interpolation around four control points, and assume that the images match approximately at the 
beginning. A similar approach has been espoused by Matsopoulos. 27 

A number of researchers have used multiresolution methods to prune the search space considered by 
their algorithms. Mandara and Fitzpatrick 8 use a multiresolution approach to reduce the size of their initial 
search space for registering medical images using simulated annealing and genetic algorithms. This work 
influenced Oghabian and Todd-Prokopek, who similarly reduced their search space when registering brain 
images with small displacements. 7 Pinz adjusted both multiresolution scale space and step size in order to 
reduce the computational complexity of a hill-climbing registration method. 22 These researchers believe 
that by starting with low-resolution images, they can reject large numbers of possible matches and find the 
correct match by progressively increasing the resolution. Note that in images with a strong periodic com- 
ponent, a number of low-resolution matches may be feasible. In such cases, the multiresolution approach 
will be unable to prune the search space and, instead, will increase the computational load. Another problem 
with a common multiresolution approach, the wavelet transform, is its sensitivity to translation. 28 

A number of methods have been proposed for fitting the entire image around the control points once 
an appropriate match has been found. Simple linear interpolation is computationally straightforward. 8 
Goshtasby has explored using a weighted least-squares approach, 19 constructing piecewise linear inter- 
polation functions within triangles defined by the control points, 13 and developing piecewise cubic 
interpolation functions. 29 These methods create nonaffine rubber sheet transformation functions to 
attempt to reduce the image distortion caused by either errors in control point matching, or differences 
in the sensors that constructed the image. 

Several algorithms exist for image registration. The algorithms described have some common draw- 
backs. The matching algorithms assume that a small number of distinct features can be matched, 10,16,30 
that specific shapes are to be matched, 31 that no rotation exists, or that the relative displacement is 
small. 7,8,12,16,17,21 Refer to Table 5.1 for a summary of many of these points. 

Choosing a small number of control points is not a trivial problem and has a number of inherent 
drawbacks. For example, the control point found may be a product of measurement noise. When two 
readings have more than a trivial relative displacement, control points in one image may not exist in the 
other image. This requires considering the power set of the control points. When an image contains 
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periodic components, control points may not define a unique mapping of the observed image to the 
reference image. Additional problems exist. The use of multiresolution cannot always trim the search 
space and, if the image is dominated by periodic elements, it will only increase the computational 
complexity of an algorithm. 7 ’ 8,22 

Many algorithms attempt to minimize the square error over the image; however, this does not consider 
the influence of noise in the image. 7,8 Most of the existing methods are sensitive to noise. 7,16,17 Section 5.4 
discusses meta-heuristics based methods, which try to overcome these drawbacks. Section 5.5 discusses 
a multiresolution approach. 

5.4 Registration Using Meta-Heuristics 

This section discusses research on automatically finding a gruence (i.e., translation and rotation) registering 
two overlapping images. Results from this research have previously been presented in a number of sources. 2,32-35 
This approach attempts to correctly calibrate two two-dimensional sensor readings with identical geome- 
tries. These assumptions about the sensors can be made without a loss of generality because 

• A method that works for two readings can be extended to register any number of readings sequentially. 

• The majority of sensors work in one or two dimensions. Extensions of calibration methods to 
more dimensions is desirable, but not imperative. 

• Calibration of two sensors presupposes known sensor geometry. If geometries are known, a 
function can be derived that maps the readings as if the geometries were identical when a regis- 
tration is given. 

This approach finds gruences because these functions best represent the most common class of 
problems. The approach used can be directly extended to include the class of all affine transformations 
by adding scaling transformations. 36 It does not consider “rubber sheet” transformations that warp the 
contents of the image because these transformations mainly correct local effects after use of an affine 
transformation correctly matches the images. 14 It assumes that any rubber sheet deformations of the 
sensor image are known and corrected before the mapping function is applied, or that their effects over 
the image intersections are negligible. 

The computational examples used pertain to two sensors returning two-dimensional gray scale data 
from the same environment. The amount of noise and the relative positions of the two sensors are not 
known. Sensor two is translated and rotated by an unknown amount with relation to sensor one. 

If the size or content of the overlapping areas is known, a correlation using the contents of the overlap 
on the two images could find the point where they overlap directly. Use of central moments could also 
find relative rotation of the readings. When the size or content of the areas is unavailable, this approach 
is impossible. 

In this work, the two sensors have identical geometric characteristics. They return readings covering 
a circular region, and these readings overlap. Both sensors’ readings contain noise. What is not known, 
however, is the relative positions of the two sensors. Sensor two is translated and rotated by an unknown 
amount with relation to sensor one. 

The best way to solve this problem depends on the nature of the terrain being observed. If unique 
landmarks can be identified in both images, those points can be used as control points. Depending on 
the number of landmarks available, minor adjustments may be needed to fit the readings exactly. 
Goshtasby’s methods could be used at that point. 13,19,29 

Thus, the problem to be solved is, given noisy gray scale data readings from sensor one and sensor 
two, find the optimal set of parameters (x-displacement, y-displacement, and angle of rotation) that 
defines the center of the sensor two image relative to the center of the sensor one image. These parameters 
would provide the optimal mapping of sensor two readings to the readings from sensor one. This can 
be done using meta-heuristics for optimization. Brooks describes implementations of genetic algorithms, 
simulated annealing, and tabu search for this problem. 2 Chen applies TRUST, a subenergy tunneling 
approach from Oak Ridge National Laboratories. 35 
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To measure optimality, a fitness function can be used. The fitness function provides a numerical 
measure of the goodness of a proposed answer to the registration problem. Brooks derives a fitness 
function for sensor readings corrupted with Gaussian noise: 2 
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where 



w is 

K{W) is 

( x',y ') is 

read 1 (x,y)read 2 (x' ,y') is 
gray 1 (x,y)gray 2 (x', y') is 
noise 1 (x,y)noise 2 (x' ,y') is 



a point in the search space 

the number of pixels in the overlap for w 

the point corresponding to (x,y) 

the pixel value returned by sensor 1 (2) at point (x,y) (x',y') 
the noiseless value for sensor 1 (2) at (x,y) ( x',y ') 
the noise in the sensor 1 (2) reading at (x,y) (x',y') 



The equation is derived by separating the sensor reading into information and additive noise compo- 
nents. This means the fitness function is made up of two components: (a) lack of fit, and (b) stochastic 
noise. The lack of fit component has a unique minimum when the two images have the same gray scale 
values in the overlap (i.e., when they are correctly registered). The noise component follows a Chi-squared 
distribution, whose expected value is proportional to the number of pixels in the region where the two 
sensor readings intersect. Dividing the difference squared by the cardinality of the overlap, makes the 
expected value of the noise factor constant. Dividing by the cardinality squared favors large intersections. 
For a more detailed explanation of this derivation, see Brooks. 2 

Other noise models simply modify the fitness function. Another common noise model addresses salt- 
and-pepper noise typically caused by either malfunctioning pixels in electronic cameras or dust in optical 
systems. In this model, the correct gray-scale value in a picture is replaced by a value of 0 (255) with an 
unknown probability p(q). An appropriate fitness function for this type of noise is Equation 5.2. 



read^x,y^ ^0 
read^x,y^ ^255 
read 2 (x',y' j ^0 
read 2 (x\y' j ^255 



{read^x,}^ - read 2 (x',y')j 
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M 



(5.2) 



A similar function can be derived for uniform noise by using the expected value E[(U 1 - U 2 ) 2 ] of the 
squared difference of two uniform variables U 1 and U 2 . An appropriate fitness function is then given by 
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Figure 5.3 shows the best fitness function value found by simulated annealing, elitist genetic algorithms, 
classic genetic algorithms, and tabu search versus the number of iterations performed. In Brooks, elitist 
genetic algorithms out-perform the other methods attempted. Further work by Chen indicates that 
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Iterations (x 25) 

Elite GA I I Tabu search 

Classic GA I Simulated Annealing 

FIGURE 5.3 Fitness function results variance 1 . 

TRUST is more efficient than the elitist genetic algorithms. 35 These studies show that optimization 
techniques can work well on the problem, even in the presence of large amounts of noise. This is surprising 
because the fitness functions take the difference of noise-corrupted data — essentially a derivative. 
Derivatives are sensitive to noise. Further inspection of the fitness functions explains this surprising 
result. Summing over the area of intersection is equivalent to integrating over the area of intersection. 
Implicitly, integrating counteracts the derivative’s magnification of noise. 

Matsopoulos uses affine, b-linear, and projective transformations to register medical images of the 
retina. 27 The techniques tested include genetic algorithms, simulated annealing, and the downhill simplex 
method. They use image correlation as a fitness function. For their application, much preprocessing is 
necessary, which removes sensor noise. Their results indicate the superiority of genetic algorithms for 
automated image registration. This is consistent with Brooks’ results. 2,34 




5.5 Wavelet-Based Registration of Range Images 

This section uses range sensor readings. More details are provided by Grewe. 39 Range images consist of 
pixels with values corresponding to range or depth rather than photometric information. The range 
image represents a perspective of a three-dimensional world. The registration approach described herein 
can be trivially applied to other kinds of images, including one-dimensional readings. If desired, the 
approaches described by Brooks 2,34 and Chen 35 can be directly extended to include the class of all affine 
transformations by adding scaling transformations. This section discusses an approach for finding these 
transformations. 

The approach uses a multiresolution technique, the wavelet transform, to extract features used to 
register images. Other researchers have also applied wavelets to this problem, including using locally 
maximum wavelet coefficient values as features from two images. 37 The centroids of these features are 
used to compute the translation offset between the two images. A principle components analysis is then 
performed and the eigenvectors of the covariance matrix provide an orthogonal reference system for 
computing the rotation between the two images. (This use of a simple centroid difference is subject to 
difficulties when the scenes only partially overlap and, hence, contain many other features.) 

In another example, the wavelet transform is used to obtain a complexity index for two images. 38 The 
complexity measure is used to determine the amount of compression appropriate for the image. Com- 
pression is then performed, yielding a small number of control points. The images, made up of control 
points for rotations, are tested to determine the best fit. 

The system described in Grewe 39 is similar to some of the previous work discussed. Similar to DeVore, 38 
Grewe uses wavelets to compress the amount of data used in registration. Unlike previous wavelet-based 
systems prescribed by Sharman 37 and DeVore, 38 Grewe’s 39 capitalizes on the hierarchical nature of the 
wavelet domain to further reduce the amount of data used in registration. Options exist to perform a 
hierarchical search or simply to perform registration inside one wavelet decomposition level. Other system 
options include specifying an initial registration estimate, if known, and the choice of the wavelet 
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Hierarchical Registration 



i 

Retained Registrations 



FIGURE 5.4 Block diagram of WaveReg system. 



decomposition level in which to perform or start registration. At higher decomposition levels, the amount 
of data is significantly reduced, but the resulting registration will be approximate. At lower decomposition 
levels, the amount of data is reduced to a lesser extent, but the resulting registration is more exact. This 
allows the user to choose between accuracy and speed as necessary. 

Figure 5.4 shows a block diagram of the system. It consists of a number of phases, beginning with the 
transformation of the range image data to the wavelet domain. Registration can be performed on only 
one decomposition level of this space to reduce registration complexity. Alternately, a hierarchical reg- 
istration across multiple levels will extract features from a wavelet decomposition level as a function of 
a number of user-selected parameters, which determine the amount of compression desired in the level. 
Matching features from the two range images are used to hypothesize the transformation between the 
two images and are evaluated. The “best” transformations are retained. This process is explained in the 
following paragraphs. 

First, a Daubechies-4 wavelet transform is applied to each range image. The wavelet data is compressed 
by thresholding the data to eliminate low magnitude wavelet coefficients. The wavelet transform produces 
a series of 3-D edge maps at different resolutions. A maximal wavelet value indicates a relatively sharp 
change in depth. 

Features, special points of interest in the wavelet domain, are simply points of maximum value in the 
current wavelet decomposition level under examination. These points are selected so that no two points 
are close to each other. The minimum distance is scaled with the changing wavelet level under exami- 
nation. Figure 5.5 shows features detected for different range scenes at different wavelet levels. Notice 
how these correspond to points of sharp change in depth. 

Using a small number of feature points allows this approach to overcome the wavelets transform’s 
sensitivity to translation. Stone 28 proposed another method for overcoming the sensitivity to translation. 
Stone noted that the low-pass portions of the wavelet transform are less sensitive to translation and that 
coarse to fine registration of images using the wavelet transform should be robust. 

The next stage involves hypothesizing correspondences between features extracted from the two unreg- 
istered range images. Each hypothesis represents a possible registration and is subsequently evaluated for 
its goodness. Registrations are compared and the best retained. 

Hypothesis formation begins at a default wavelet decomposition level. Registrations retained at this 
level are further “refined” at the next lower level, L-l. This process continues until the lowest level in the 
wavelet space is reached. 

For each hypothesis, the corresponding geometric transformation relating the matched features is 
calculated, and the remaining features from one range image are transformed into the other’s space. This 
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(a) (b) 

FIGURE 5.5 Features detected, approximate location indicated by white squares: (a) for wavelet Level 2 and (b) 
for wavelet Level 1 . 




FIGURE 5.6 (a) Features extracted Level 1, Image 1, (b) Features extracted Level 1, Image 2, (c) Merged via 

averaging registered images, (d) Merged via subtraction of registered images. 



greatly reduces the computation involved in hypothesis evaluation in comparison to those systems that 
perform non-feature-based registration. Next, features not part of the hypothesis are compared. Two 
features match if they are close in value and location. Hypotheses are ranked by the number of features 
matched and how closely the features match. Examples are given in Figure 5.6. 

5.6 Registration Assistance/Preprocessing 

All of the registration techniques discussed herein operate on the basic premise that there is identical 
content in the data sets being compared. However, the difficulty in registration pertains to the fact that 
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(a) (b) (c) 

FIGURE 5.7 (a) Image 1, (b) Image 2, (c) New Image 1 corrected to appear more like Image 2 in photometric 

content. 



the content is the same semantically, but often not numerically. For example, sensor readings taken at 
different times of the day can lead to lighting changes that can significantly alter the underlying data 
values. Also, weather changes can lead to significant changes in data sets. Registration of these kinds of 
data sets can be improved by first preprocessing the data. Figure 5.7 shows some preliminary work by 
Grewe 40 on the process of altering one image to appear more like another image in terms of photometric 
values. Such systems may improve registration systems of the future. 

5.7 Conclusion 



Addressing the data registration problem is an essential preprocessing step in multisensor fusion. Data 
from multiple sensors must be transformed onto a common coordinate system. This chapter provided 
a survey of existing methods, including methods for finding registrations and applying registrations to 
data after they have been found. In addition, example approaches were described in detail. 

Brooks 2 and Chen 35 detail meta-heuristic-based optimization methods that can be applied to raw data. 
Of these methods, TRUST, a new meta-heuristic from Oak Ridge National Laboratories, is the most 
promising. Fitness functions have been given for readings corrupted with Gaussian, uniform, and salt- 
and-pepper noise. Because these methods use raw data, they are computationally intensive. 

Grewe 39 presents a wavelet-based approach to registering range data. Features are extracted from the 
wavelet domain. A feedback approach is then applied to search for good registrations. Use of the wavelet 
domain compresses the amount of data that must be considered, providing for increased computational 
efficiency. Drawbacks to using feature-based methods have also been discussed in the chapter. 
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6.1 Introduction 



This chapter offers a conceptual-level view of the data fusion process and discusses key principles 
associated with both data analysis and information combination. The discussion begins with a high-level 
view of data fusion requirements and analysis options. Although the discussion focuses on tactical 
situation awareness development, a much wider range of applications exists for this technology. 

After motivating the concepts behind effective information combination and decision making through 
a series of easily understood metaphors, the chapter 

• Presents a top-down view of the data fusion process, 

• Discusses the inherent complexities of combining uncertain, erroneous, and fragmentary information, 

• Offers a taxonomic approach for distinguishing classes of fusion algorithms, and 

• Identifies key algorithm requirements for practical and effective machine-based reasoning. 

6.1.1 Biological Fusion Metaphor 

Multiple sensory fusion in biological systems provides a natural metaphor for studying artificial data 
fusion systems. As with any good metaphor, consideration of a simpler or more familiar phenomenon 
can provide valuable insight into the study of a more complex or less familiar process. 
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Even the most primitive animals sense their environment, develop some level of situation awareness, 
and react to the acquired information. Situation awareness directly supports survival of the species by 
assisting in the acquisition of food and the avoidance of animals of prey. A bam owl, for instance, fuses 
visual and auditory information to help accurately locate mice under very low light conditions, while a 
mouse responds to threatening visual and auditory cues to attempt to avoid being caught by an owl. 

In general, natural selection has tended to favor the development of more capable senses (sensors) 
and more effective utilization of the derived information (exploitation and fusion). Color vision in 
humans, for instance, is believed to have been a natural adaptation that permitted apes to more easily 
locate ripe fruit among vegetation. Situation awareness in animals can rely on a single, highly developed 
sense, or on multiple, often less capable senses. A hawk depends principally on a highly acute visual 
search and tracking capability, while a shark primarily relies on its sense of smell when hunting. Sexual 
attraction can depend primarily on sight (plumage), smell (pheromones), or sound (mating call). For 
humans, sight is arguably the most vital sense, with hearing a close second. Dogs, on the other hand, 
rely most heavily on the senses of smell and hearing, with vision typically acting as a secondary infor- 
mation source. 

Sensory input in biological organisms typically supports both sensory cueing and situation awareness 
development. Sounds cue the visual sense to the presence and the general direction of an important 
event. Information gained by the aural sense (i.e., direction, speed, and tentative object classification) is 
then combined (fused) with the information gathered by the visual system to produce more complete, 
higher confidence, or higher level situation awareness. In many cases, multiple sensory fusion can be 
critical to successful decision making. Food that looks appetizing (sight) might be extremely salty (taste), 
spoiled (smell), or too hot (touch). At the other extreme, fusion of multiple sensory input might be 
unnecessary if the various senses provide highly redundant information. Bacon frying in a pan need not 
be seen, smelled, and tasted to be positively identified; each sense, taken separately, could perform such 
a function. 

Although discarding apparently redundant information may seem to be prudent, such information 
can aid in sorting out conflicts, both intentional (deception) and unintentional (confusion). While single- 
source deception is reasonably straightforward to perpetrate, deception across multiple senses (sensor 
modalities) is considerably more difficult. For example, successful hunting and fishing depend, to a large 
degree, on effective multisource deception. Duck hunters use both visual decoys and mating calls to 
simultaneously provide deceptive visual and auditory information. Because deer can sense danger through 
the sense of smell, sound, and sight, the shrewd hunter must mask his scent (or stay down-wind), make 
little or no noise, and remain motionless if the deer looks in his direction. Even in nonadversarial 
applications, data fusion requires resolution of unintentional conflicts among supporting data sources 
in order to deal effectively with the inherent uncertainty in both the measurement and decision spaces. 

Multiple sensory fusion need not be restricted to the familiar five senses of sight, sound, smell, taste, and 
touch. Internal signals, such as acidity of the stomach, coupled with visual and/or olfactory cues, can trigger 
hunger pains. The fusion of vision, inner-ear balance information, and muscle feedback signals facilitate 
motor control. In a similar manner, measurement and signature intelligence (MASINT) in a tactical appli- 
cation focuses on the collection and analysis of a wide range of nontraditional information classes. 

6.1.2 Command and Control Metaphor 

The game of chess provides a literal metaphor for military command and control (C 2 ), as well as an 
abstract metaphor for any system that senses and reacts to its environment. Both chess players and 
battlefield commanders require a clear picture of the “playing field” to properly evaluate the options 
available to them and their opponents. In both chess and C 2 , opposing players command numerous 
individual resources (i.e., pieces or units) that possess a range of characteristics and capabilities. Resources 
and strategies vary over time. Groups of chess pieces are analogous to higher-level organizations on the 
battlefield. The chessboard represents domain constraints to movement that are similar to constraints 
posed by terrain, weather, logistics, and other features of the military problem domain. Player-specific 
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strategies are analogous to tactics, while legal moves represent established doctrine. In both domains, 
the overall objective of an opponent may be known, while specific tactics and subgoals must be deduced. 

Despite a chess player’s complete knowledge of the chess board (all domain constraints), the location 
of all pieces ( own and opponent-force locations), and all legal moves (own and opponent-force doctrine), 
and his ability to exercise direct control over all of his own assets, chess remains a highly challenging 
game. Metaphorically similar to chess, tactical situation development has numerous domain character- 
istics that make it an even more challenging problem. 

First, battlefield commanders normally possess neither a complete nor fully accurate picture of their 
own forces or those of their adversaries. Forced to deal with incomplete and inaccurate force structure 
knowledge, as well as location uncertainty, chess players would be reduced to guessing the location and 
composition of an adversary’s pieces, somewhat akin to playing “Battleship,” the popular children’s game. 

Second, individual sensors provide only limited observables, coverage, resolution, and accuracy. Thus, 
the analysis of individual sensor reports tend to lead to ambiguous and rather local interpretations. Third, 
domain constraints in tactical situation awareness are considerably more complex than the well-struc- 
tured (and level) playing field in chess. Fourth, doctrinal knowledge in the tactical domain tends to be 
more difficult to exploit effectively and far less reliable that its counterpart in chess. 

A wide range of other application-motivated metaphors can also be useful for studying specific fusion 
applications. Data fusion, for example, seems destined to play a significant role in the development of 
future “smart highway” control systems where a simple car driving metaphor can be applied to study 
sensor requirements and fusion opportunities. The underpinning of such a system is a sophisticated control 
capability that optimally resolves a range of conflicting requirements, such as (1) expedite the movement 
of both local and long distance traffic, (2) ensure maximum safety for all vehicles, and (3) create the 
minimum environmental impact. The actors in the metaphor are drivers (or automated vehicle control 
systems), the rules of the game are the “rules of the road,” and domain constraints are the road network 
and traffic control means. Individual players possess individualized objectives and tactics; road charac- 
teristics and vehicle performance capabilities provide physical constraints on the problem solution. 

6.1.3 Puzzle-Solving Metaphor 

Situation awareness development requires the production and maintenance of an adequate multiple level- 
of-abstraction picture of a (dynamic) situation; therefore, the data fusion process can be compared to 
assembling a complex jigsaw puzzle for which no picture of the completed scene exists. While assembling 
puzzles that contain hundreds of pieces (information fragments) can challenge an individual’s skill and 
patience, the production of a comprehensive situational picture, created by fusing disparate and frag- 
mentary sensor-derived information, represents an even more challenging task. Although a completed 
jigsaw puzzle represents a fixed scene, the process of collecting and integrating the numerous information 
fragments clearly evolves over time. Time, on the other hand, represents a key dimension in highly 
dynamic tactical situation awareness applications. 

The partially completed puzzle (fused situation awareness product) illustrated in Figure 6. 1 contains 
numerous aggregate objects (i.e., forest and meadow), each composed of simpler objects (i.e., trees and 
ground cover). Each of these objects, in turn, have been assembled from multiple puzzle pieces, some 
representing a section of bark on a single tree trunk, others a grassy area associated with a meadow. In 
terms of the metaphor then, sensor-derived information can be associated with individual puzzle pieces, 
providing little more information than color and texture, as well as pieces that depict higher level of 
abstraction objects. 

At the beginning of the reconstruction process, problem solving necessarily relies on general analysis 
strategies (e.g., locate border pieces). Because little context exists to direct either puzzle piece selection 
or puzzle piece placement, at the early stages of the process, rather simple, brute-force pattern matching 
strategies are needed. A predominately blue-colored piece, for example, might represent either sky or 
water with little basis for distinguishing between the two interpretations. Unless they came from an 
unopened box, there may be no assurance that the scattered pieces on the table all belong in the puzzle 
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FIGURE 6.1 Puzzle- solving metaphor example. 



under construction. However, once certain sections of the puzzle have been filled in, the assembly process 
( fusion ) tends to become much more goal-directed. 

Fitting a single puzzle piece supports both scene entropy reduction as well as higher level-of- abstraction 
scene interpretation. As regions of the puzzle begin to take form, identifiable features in the scene emerge 
(e.g., trees, grass, and cliffs) and higher-level interpretations can be developed (e.g., forest, meadows, and 
mountains). By supporting the placement of the individual pieces, as well as the goal-driven search ( sensor 
resource management) for specific pieces, the context provided by the developing multiple level-of- 
abstraction picture of the scene ( situation awareness product) helps further focus the reconstruction 
process ( fusion process optimization). 

Just as duplicate or erroneous pieces can significantly complicate puzzle assembly, redundant and 
irrelevant sensor-derived information similarly burdens machine-based situation development. There- 
fore, goal-directed information collection offers a two-fold benefit: critical information requirements are 
satisfied and the collection (and subsequent analysis) of unnecessary information is minimized. Although 
numerous puzzle pieces may be yet unplaced ( undetected objects) and perhaps some pieces are actually 
missing ( information not collectible by the available sensor suite), a reasonably comprehensive, multiple 
level-of-abstraction understanding of the overall scene ( situation awareness) gradually emerges. 

Three broad classes of knowledge are apparent in the puzzle reconstruction metaphor: 

• Individual puzzle pieces — collected information fragments, i.e., sensor-derived knowledge, 

• Puzzle -solving strategies, such as edge detection and pattern matching — a priori reasoning 
knowledge 

• World knowledge, such as the relationship between meadows and grass — domain context knowledge. 

To investigate the critical role that each knowledge form plays in fusion product development, recast 
the analysis in terms of a building construction metaphor. Puzzle pieces ( sensor input) are clearly the 
building blocks required to assemble the scene (fused situation awareness product). A priori reasoning 
knowledge represents construction knowledge and skills, and context provides the nails and mortar that 
“glue” the sensor input together to form a coherent whole. When too many puzzle pieces (or building 
blocks) are missing ( inadequate sensor-derived information), scene reconstruction (or building construc- 
tion) becomes difficult or impossible. 
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FIGURE 6.2 (a) Two-dimensional measurements and (b) the corresponding three-dimensional measurement 

space. 

A simple example demonstrates how both the complexity of the fusion process and the quality of the 
resultant product are sensitive to the availability of adequate information. Figure 6.2(a) illustrates a cluster 
of azimuth and elevation measurements associated with two separate groups of air targets. Given the 
spatial overlap between the data sets, reliable target-to-group assignment may not be possible, regardless 
of the selected analysis paradigm or the extent of algorithm training. However, with the addition of range 
measurements ( increased measurement space dimensionality), two easily separable clusters become readily 
apparent (Figure 6.2(b)). Because the information content of the original 2-D data set was fundamentally 
inadequate, even sophisticated clustering algorithms would be unable to discriminate between the two 
target groups. However, with the addition of the third measurement dimension, a simple clustering 
algorithm easily handles the decision task. 

Reasoning knowledge can be implemented using a spectrum of problem solving paradigms (e.g., rules, 
procedures, and statistical-based algorithms), evidence combination strategies (e.g., Bayes, Dempster- 
Shafer, and fuzzy set theory), and decision-making approaches (e.g., rule instantiation and parametric 
algorithms). In general, the process of solving a complex puzzle (or performing automated situation 
awareness) benefits from both bottom-up (deductive-based) and top-down (goal-directed) reasoning 
that exploits relationships among the hierarchy of domain entities (i.e., primitive, composite, aggregate, 
and organizational). 

In the puzzle-solving metaphor, context knowledge refers to relevant domain knowledge not explicitly 
contained within a puzzle piece ( non-sensor-derived knowledge). Humans routinely apply a wide range 
of contextual knowledge during analysis and decision making.* For example, context-sensitive evaluation 
of Figure 6.1 permits the determination that the picture is a summer scene in the western U.S. The season 
and location are deduced from the presence of deciduous trees in full leaf (summer) in the foreground 
and jagged snow-capped mountain peaks in the distance (western U.S.). In a similar fashion, the exploi- 
tation of context knowledge in automated fusion systems can promote much more effective and com- 
prehensive interpretations of sensor-derived information. 

In both puzzle assembly and automated situation development, determining when an adequate situ- 
ation representation has been achieved can be difficult. In the puzzle reconstruction problem, although 
the general landscape characteristics might be evident, missing puzzle pieces could depict denizens of 
the woodland community that can be hypothesized, but for which no compelling evidence yet exists. 
On the other hand, individual puzzle pieces might contain partial or ambiguous information. For 



' This fact partially accounts for the disparity in performance between manual and automated approaches to data 



fusion. 
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example, the presence of a section of log wall in the evolving scene suggests the possibility of a log cabin. 
However, additional evidence is required to validate such a hypothesis. 

6.1.4 Evidence Combination 

Reliance on a single information source can lead to ambiguous, uncertain, and inaccurate situation 
awareness. Data fusion seeks to overcome such limitations by synergistically combining all relevant (and 
available) information sources leading to the generation of consistent, accurate, comprehensive, and 
global situation awareness. A famous poem by John Godfrey Saxe,* written more than a century ago, 
aptly demonstrates both the need for and challenge of effectively combining fragmentary information. 

The poem describes an attempt by six blind men to gain a first-hand understanding of an elephant. 
The first man happens to approach the elephant from the side and surmises that an elephant must be 
something like a wall. The second man touches the tusk and imagines an elephant to be like a spear. The 
third man approaches the trunk and decides an elephant is similar to a snake. The fourth man reaches 
out and touches a leg and determines an elephant to be much like a tree. The fifth man chances to touch 
an ear and imagines an elephant must be like a fan. The sixth man grabs the tail and concludes an 
elephant is similar to a rope. While each man’s assessment is entirely consistent within his own limited 
sensory space and myopic frame of reference, unless the six observations are effectively integrated (fused ), 
a true picture of an elephant fails to emerge. 

Among other insights, the puzzle-solving metaphor illustrated that (1) complex dependencies can 
exist among and between information fragments and the completed situation description, and 
(2) determining whether an individual puzzle piece actually belongs to the scene being assembled can 
be difficult. Even when the collected information is known to be relevant, based strictly on local inter- 
pretations, determining whether a given blue-colored piece represents sky, water, or some other feature 
class may not be possible. Much like assembling observations, hunting for clues, and evaluating motives 
required during criminal investigations, a similar approach to information combination is required by 
general situation awareness systems. Just as at the outset of a criminal investigation, a single strand of 
hair might appear insignificant, but it could later prove to be the key piece of evidence that discriminates 
among several suspects. Similarly, a seemingly irrelevant piece of sensor-derived information might 
ultimately link observations with motives, or provide other significant situational awareness benefits. 
Thus, not only is the information content ( information measure ) associated with a given piece of data 
important; its relationship to the overall fusion task is also vital to achieving successful information 
fusion. As a direct consequence of this observation, the development of a comprehensive information 
theoretical framework for the data fusion process appears to be problematic. Only through a top-down, 
holistic treatment of the analysis task can the content of a single information fragment be properly 
assessed and its true value to the overall fusion process be fully realized. 

6.1.5 Information Requirements 

Because no widely accepted formal theory exists for determining when adequate information has been 
assembled to support a given fusion task, empirical measures of performance generally must be relied 
upon to evaluate the effectiveness of both individual fusion algorithms and an overall fusion system. In 
general, data fusion performance can be enhanced by 

• Technical improvements in sensor measurements (i.e., longer range, higher resolution, improved 
signal-to-noise ratio, better accuracy, higher reliability); 

• Increased measurement space dimensionality afforded by heterogeneous sensors that provide at 
least partially independent information; 



* Saxe, J. G., “The Blind Man and the Elephant,” The Poetical Works of John Godfrey Saxe , Boston, MA: Houghton, 
Mifflin and Company, 1882. 
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• Spatially distributed sensors providing improved coverage, perspective, and measurement reliability; 

• Relevant non-sensor-derived domain knowledge to constrain the information combination and 
decision-making process. 

In general, effective data fusion automation requires the development of robust, context-sensitive 
algorithms that are practical to implement. The first two requirements reflect the “quality of performance” 
of the algorithm, while the latter reflects cost/benefit tradeoffs associated with meeting a wide range of 
implicit and explicit performance objectives. In general, robust performance argues for the use of all 
potentially relevant sensor-derived information sources and reasoning knowledge. Achieving context- 
sensitive performance argues for maximal utilization of relevant non-sensor-derived information. On the 
other hand, to be practical to implement and efficient enough to employ in an operational setting, the 
algorithms may need to compromise some fusion performance quality. Consequently, system developers 
must quantify or otherwise assess the value of these various information sources in light of system 
requirements, moderated by programmatic, budgetary, and performance constraints (e.g., decision time- 
line and hardware capability). The interplay between achieving optimal algorithm robustness and context- 
sensitivity, on the one hand, and a practical implementation, on the other, is a fundamental tension 
associated with virtually any form of machine-based reasoning directed at solving complex, real-world 
problems. 

6.1.6 Problem Dimensionality 

Effective situational awareness, with or without intentional deception, generally benefits from the col- 
lection and analysis of a wide range of observables. As a result of the dynamic nature of many problem 
domains, observables can change with time and, in some cases, may require continuous monitoring. In 
a tactical application, objects of interest can be stationary (fixed or currently nonmoving), quasistationary 
(highly localized motion), or moving. Individual objects possess characteristics that constrain their behav- 
ior. Objects emit different forms of electromagnetic energy that vary with time and can indicate the state 
of the object. Object emissions include intentional or active emissions, such as radar, communications, 
and data link signals, as well as unintentional or passive emissions, such as acoustic, magnetic, or thermal 
signatures generated by internal heat sources or environmental loading. Patterns of physical objects and 
their behavior provide indications of organization, tactics, and intent. Patterns of emissions, both active 
and passive, can reveal the same. For example, a sequence of signals emitted from a surface-to-air missile 
radar over time representing search, lock-on, launch, and hand-over clearly indicates hostile intent. 

A single sensor modality is incapable of measuring all relevant information dimensions; therefore, 
multiple sensor classes often must be relied upon to detect, track, classify, and infer the likely intent of 
a host of objects, from submarines and surface vessels, to land, air, and space-based objects. Certain 
sensor classes lend themselves to surveillance applications, providing both wide-area and long-range 
coverage, and readily automated target detection capability. Examples of such sensor classes include 
signals intelligence (SIGINT) for collecting active emissions, moving target indication (MTI) radar for 
detecting and tracking moving targets against a high clutter background, and synthetic aperture radar 
(SAR) for detecting stationary targets. Appropriately cued, other sensor classes that possess narrower 
fields of view and that typically operate at much shorter ranges may be capable of providing higher 
fidelity measurement to support refined analysis. Geospatial and other intelligence databases can provide 
the static domain context within which the target-sensed data must be interpreted, while environmental 
sensors generate dynamic context estimates, such as weather and current atmospheric conditions. 

6.1.7 Commensurate and Noncommensurate Data 

Although the fusion of similar (commensurate) information would seem to be more straightforward 
than the fusion of dissimilar (noncommensurate) information, that is not always the case. Three examples 
are offered to highlight the varying degrees of difficulty associated with the combination of multiple- 
source data. First, consider the relative simplicity of fusing registered electronic intelligence (ELINT) data 
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and real-time synthetic aperture radar (SAR) imagery. Although these sensors measure dramatically 
different information dimensions, both sources provide reasonably wide area coverage, relatively good 
geolocation, and highly complementary information. As a consequence, the fusion process tends to be 
straightforward. Even when an ELINT sensor provides little more than target line-of-bearing, the ELINT 
and SAR measurements can potentially be combined by simply overlaying the two data sets. If the line- 
of-bearing intercepts a single piece of equipment in the SAR image, the radar system class, as well as its 
precise location, would be known. This information, in turn, can support the identification of other 
nearby objects in the image (e.g., missile launchers normally associated with track-while-scan radar). 

At the other end of the spectrum, the fusion of information from two or more identical sensors can 
present a significant challenge. Consider, for example, fusing data sets obtained from spatially separated 
forward-looking infrared (FLIR) radars. Although FLIR imagery provides good azimuth and elevation 
resolution, it does not directly measure range. Because the range and view angles to targets will be different 
for multiple sensors, combining such data sets demands sophisticated registration and normalization. 

Finally, consider the fusion of two bore-sited sensors: light-intensified and forward-looking infrared 
(FLIR). The former device amplifies low intensity optical images to enhance night vision. When coupled 
with the human’s natural ability to separate moving objects from the relatively stationary background, such 
devices permit visualization of the environment and detection of both stationary and moving objects. 
However, such devices offer limited capability for the detection of stationary personnel and equipment 
located in deep shadows or under extremely low ambient light levels (e.g., heavy cloud cover, no moon, or 
inside buildings). Rather than detecting reflected energy, FLIR devices detect thermal radiation from objects. 
Consequently, these devices support the detection of humans, vehicles, and operating equipment based on 
their higher temperature relative to the background. Consequently, with bore-sighted sensors, pixel-by-pixel 
combination of the two separate images may be feasible, providing a highly effective night vision capability. 

6.2 Biologically Motivated Fusion Process Model 

A hierarchically organized functional-level model of data fusion is presented in Chapter 2. In contrast, 
this section focuses on a process-level model. While the functional model describes what analysis functions 
or processes need to be performed, a process-level model describes at a high level of abstraction how 
this analysis is accomplished. 

The goal of data fusion, as well as most other forms of data processing, is to turn data into useful 
information. In perhaps the simplest possible view, all of the required information is assumed to be 
present within a set of sensor measurements. Thus, the role of data fusion is extraction of information 
embedded in a data set (separating the wheat from the chaff). In this case, fusion algorithms can be 
characterized as a function of 

• Observables 

• Current situation description (e.g., target track files and current situation description) 

• A priori declarative knowledge (e.g., distribution functions, templates, constraint sets, filters, and 
decision threshold values). 

As shown in Figure 6.3(a), the fusion process output provides updates to the situation description, as 
well as feedback to the reasoning knowledge base to support knowledge refinement (learning). 

Signal processing, statistical hypothesis testing, target localization performed by intersecting two 
independently derived error ellipses, and target identification based on correlation of an image with a 
set of rigid templates are simple examples of such a fusion model. In general, this “information extraction” 
view of data fusion makes a number of unstated, simplifying assumptions including the existence of 

• Adequate information content in the sensor observables 

• Adequate sensor update rates 

• Homogeneous sensor data 
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FIGURE 6.3 (a) Basic fusion process model and (b) generalized process model. 

• Relatively small number of readily distinguishable targets 

• Relatively high resolution sensors 

• High reliability sensors 

• Full sensor coverage of the area of interest 

• Stationary, Gaussian random interference. 

When such assumptions are appropriate, data analysis tends to be straightforward and an “information 
extraction” fusion model is adequate. Rigid template-match paradigms typically perform well when a 
set of observables closely matches a single template and are uncorrelated with the balance of the templates. 
Track association algorithms perform well against a small number of moving, widely spaced targets 
provided the radar generates relatively high update rates. The combination of similar features is often 
more straightforward than the combination of disparate features. When the sensor data possesses ade- 
quate information content, high confidence analysis is possible. High signal-to-noise ratios tend to 
enhance signal detection. High resolution sensors reduce ambiguity and uncertainty with respect to 
feature measurements (e.g., location and frequency). High reliability sensors maximize sensor availability. 
Adequate sensor coverage provides a “complete” view of the areas of interest. Statistical-based reasoning 
is generally simplified when signal interference can be modeled as a Gaussian random process. 

Typical applications where such assumptions are realistic, include 

• Track assignment in low target-density environments or for ballistic targets that obey well-estab- 
lished physical laws of motion 

• Classification of military organizations based on associated radio types 

• Detection of signals and targets exhibiting high signal-to-background ratio. 

However, numerous real-world data fusion tasks exhibit one or more of the following complexities: 

• Large number of target and nontarget entities (e.g., garbage trucks may be nearly indistinguishable 
from armored personnel carriers); 

• Within-class variability of individual targets (e.g., hatch open vs. hatch closed); 
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• Low data rates (exacerbating track association problems); 

• Multiple sensor classes (disparate numeric and symbolic observables can be difficult to combine); 

• Inadequate sensor coverage of areas of interest (i.e., inadequate number of sensors, obscuration 
due to terrain and foliage, radio frequency interference, weather, or counter measures); 

• Inadequate set of sensor observables (e.g., inadequate input space dimensionality); 

• Inadequate sensor resolution; 

• Registration and measurement errors; 

• Inadequate a priori statistical knowledge (e.g., unknown prior and conditional probabilities, mul- 
timodal density functions, or non-Gaussian and nonstationary statistics); 

• Processing and communication latencies; 

• High level-of-abstraction analysis product required (i.e., not merely platform location and iden- 
tification); 

• Complex propagation phenomenon (i.e., multipath, diffraction, or atmospheric attenuation); 

• Purposefully deceptive behavior. 

When such complexities exist, sensor-derived information tends to be incomplete, ambiguous, erro- 
neous, and difficult to combine and/or abstract. Thus, a data fusion process that relies on rigid compo- 
sition among (1) the observables, (2) the current situation description, and (3) a set of rigid templates 
or filters, tends to be fundamentally inadequate. 

As stated earlier, rather than simply “extracting” information from sensor-derived data, effective data 
fusion requires the combination, consolidation, organization, and abstraction of information. Such 
analysis can enhance the fusion product, its confidence, and its ultimate utility in at least four ways: 

1. Existing sensors can be improved to provide better resolution, accuracy, sensitivity, and reliability. 

2. Additional similar sensors can be employed to improve the coverage and/or confidence in the 
domain observables. 

3. Dissimilar sensors can be used to increase the dimensionality of the observation space, permitting 
the measurement of at least partially independent target attributes (a radar can offer excellent 
range and azimuth resolution, while an ELINT sensor can provide target identification). 

4. Additional domain knowledge and context constraints can be utilized. 

While the first three recommendations effectively increase the information content and/or dimension- 
ality of the observables, the latter effectively reduces the decision space dimensionality by constraining 
the possible decision states. 

Observables can be treated as explicit knowledge (i.e., knowledge that is explicitly provided by the 
sensors). Context knowledge, on the other hand, represents implicit (or non-sensor-derived) knowledge. 
Although human analysts routinely use both forms in performing fusion tasks, automated approaches 
have traditionally relied almost exclusively on the former. 

As an example of the utility of implicit domain knowledge, consider the extrapolation of the track of 
a ground-based vehicle that has been observed moving along the relatively straight-line path shown in 
Figure 6.4. Although the target is a wheeled vehicle traveling along a road with a hairpin curve just beyond 
the last detection point, a purely statistical-based tracker will likely attempt to extend the track through 
the hill (the reason for the curve in the road) and into the lake on the other side. 

Although tracking aircraft, ballistic projectiles, and naval vessels using statistical-based motion models 
has been highly successful, adapting such algorithms to tracking ground vehicles has proved to be a 
considerable challenge. Tracked and wheeled vehicles typically exhibit many more degrees of freedom 
than a high performance aircraft or naval vessel because they can stop and move in an unpredictable 
manner. Additional complications include the potentially large numbers of ground vehicles, nonresolv- 
able individual vehicles, terrain and vegetation masking, and infrequent target update rates. However, 
through the application of relevant domain constraints (e.g., mobility, observability, vehicle class behav- 
ior, and vehicle group behavior), the expectation-based analysis process can be effectively constrained, 



©2001 CRC Press LLC 




FIGURE 6.4 Road-following target tracking model. 

thus helping to manage the additional degrees of freedom. In much the same way that a system of 
equations with too many unknowns does not produce a unique solution, “missing” domain knowledge 
can lead to an “underdamped” Kalman filter solution to ground target tracking. In recognition of the 
benefits of context-sensitive analysis, domain-sensitive ground target tracking models have received 
considerable interest in recent years. 

In addition to the importance of reasoning in context, the road-following target tracking problem also 
dramatically illustrates the critical role of paradigm selection in the algorithm development process. Rather 
than demonstrating the failure of a statistical-based tracker, the above example illustrates its misappli- 
cation. Applying a purely statistical approach to this problem assumes (perhaps unwittingly) that domain 
constraints are either irrelevant or insignificant. However, in this application, domain constraints tend 
to be stronger than the relatively weak constraints on platform motion provided by a strictly statistical- 
based motion model. 

Paradigm selection, in fact, must be viewed as a key component of successful data fusion automation. 
Consequently, algorithm developers must ensure that both the capability and limitations of a selected 
problem-solving paradigm are appropriately matched to the requirements of the fusion task they are 
attempting to automate. 

To illustrate the importance of both context-sensitive reasoning and paradigm selection, consider the 
problem of analyzing the time-stamped radar detections from multiple closely spaced targets, some with 
potentially crossing trajectories, as illustrated in Figure 6.5. A traditional statistical tracking algorithm 
typically associates the “closest” (with respect to a specified evaluation metric) new detection to an existing 
track. A human analyst, on the other hand, would quite naturally invoke a context-sensitive model of 
vehicle behavior. By employing multiple behavior models, alternative interpretations of the observations 
can be made. False hypotheses can be eliminated once adequate information is obtained to resolve the 
associated ambiguity. 

Emulating such an analysis strategy requires the time-stamped detections to be associated with local 
cultural and topographic features. In addition, the analysis model(s) must accommodate individual 
vehicle-class capabilities, as well as a priori class-specific behavioral knowledge. By doing so, it can be 
inferred that tracks 1-3 would be highly consistent with a road-following behavior, tracks 4 and 5 would 
be determined to be most consistent with a minimum terrain-gradient following behavior, while track 
6 would be found to be inconsistent with any ground-based vehicle behavior model. By evaluating track 
updates from targets 1-3 with respect to road association, estimated vehicle speed, and observed inter- 
target spacing (assuming individual targets are resolvable), it can be deduced that targets 1-3 are wheeled 
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FIGURE 6.5 Example of the fusion of multiple-target tracks over time. 

vehicles traveling in a convoy along a secondary road. Based on the maximum observed vehicle speeds 
and the associated surface conditions along their trajectories, tracks 4 and 5 can be deduced to be tracked 
vehicles. Finally, because of its relatively high speed and the rugged terrain in the vicinity, track 6 would 
be determined to be most consistent with a low-flying airborne target. Because the velocity of target 6 
is too low to be a fixed-wing aircraft, the target can be inferred to be a helicopter. 

Targets may be moving at one instant of time and stationary at another and communicating during 
one interval and silent during another, resulting in four mutually exclusive target states: (1) moving, 
nonemitting, (2) moving emitting, (3) nonmoving, nonemitting, and (4) nonmoving, emitting. Over 
time, many entities in the domain may change between two or more of these four states. Thus, if the 
situation awareness product is to be continuously maintained, data fusion inherently involves a recursive 
analysis. Table 6.1 provides a mapping between these four target states and a wide range of sensor classes. 
As shown, the ability to track entities through these state changes effectively requires multiple source 
sensor data. 

In general, individual targets exhibit complex patterns of behavior that can help discriminate object 
classes and identify activities of interest. Consider the scenario depicted in Figure 6.6, showing the 
movement of a tactical erectable missile launcher (TEL) between time t 0 and time t 6 . At t 0 , the vehicle is 
in a location that makes it difficult to detect. At t,, the vehicle is moving along a dirt road at velocity Vj. 
At time t 2 , the vehicle continues along the road and begins communicating with its support elements. 
At time t 3 , the vehicle is traveling off road at velocity v 3 along a minimum terrain gradient path. At time 
t 4 , the target has stopped moving and begins to erect its launcher. At time t 5 , just prior to launch, radar 
emissions begin. At time t 6 , the vehicle is traveling to a new hide location at velocity v 6 . 



TABLE 6. 1 Mapping between Sensor Classes and Target States 
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FIGURE 6.6 Dynamic target scenario showing sensor snapshots over time. 



TABLE 6.2 Interpretation of Scenario Depicted in Figure 6.6 
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Table 6.2 identifies sensor classes that could contribute to the detection and identification of the various 
target states. Opportunities for effective sensor cross cueing for the TEL scenario discussed earlier are 
shown in the “Potentially Contributing Sensors” column. At the lowest level of abstraction, observed 



©2001 CRC Press LLC 



TABLE 6.3 Mapping between Sensor Classes and Activities for a Bridging Operation 



State MTI Radar SAR COMINT ELINT FLIR Optical Acoustic 

Engineers move to river bank • • • 

Construction activity • • ... . 

Forces move toward river bank • • • 

Forces move from opposite side of river • • • 



behavior can be interpreted with respect to a highly local perspective, as indicated in column 6, “Local 
Interpretation.” By assuming that the object is performing some higher level behavior, progressively more 
global interpretations can be developed as indicated in columns 7 and 8. 

Individual battle space objects are typically organized into operational or functional-level units, 
enabling observed behavior among groups of objects to be analyzed to generate higher level situation 
awareness products. Table 6.3 categorizes the behavioral fragments of an engineer battalion engaged in 
a bridge-building operation and identifies sensors that could contribute to the recognition of each 
fragment. 

Situation awareness development involves the recursive refinement of a composite multiple level-of- 
abstraction scene description. Consequently, the generalized fusion process model shown in Figure 6.3(b) 
supports the effective combination of (1) domain observables, (2) a priori reasoning knowledge, and 
(3) the multiple level-of-abstraction/multiple-perspective fusion product. The process refinement loop 
controls both effective information combination and collection management. Each element of the process 
model is potentially sensitive to implicit (non-sensor-derived) domain knowledge. 

6.3 Fusion Process Model Extensions 

Recasting the generalized fusion process model within a biologically motivated framework establishes its 
relationship to the more familiar manual analysis paradigm. With suitable extensions, this biological 
framework leads to the development of a problem-solving taxonomy that categorizes the spectrum of 
machine-based approaches to reasoning. Drawing on this taxonomy of problem solving approaches helps 
to 



• Reveal underlying similarities and differences between apparently disparate data analysis paradigms, 

• Explore fundamental shortcomings of classes of machine-based reasoning approaches, 

• Demonstrate the critical role of a database management system in terms of its support to both 
algorithm development and algorithm performance, 

• Identify opportunities for developing more powerful approaches to machine-based reasoning. 

6.3.1 Short-, Medium-, and Long-Term Knowledge 

The various knowledge forms involved in the fusion process model can be compared with short-term, 
medium-term and long-term memory. Short-term memory retains highly transient short-term knowledge; 
medium-term memory retains dynamic, but somewhat less transient medium-term knowledge;* and long- 
term memory retains relatively static long-term knowledge. Thus, just as short-, medium-, and long-term 
memory suggest the durability of the information in biological systems, short-, medium-, and long-term 
knowledge relate to the durability of the information in machine-based reasoning applications. 



* In humans, medium-term memory appears to be stored in the hippocampus in a midprocessing state between 
short-term and long-term memory, helping to explain why, after a trauma, a person often loses all memory from a 
few minutes to a few days. 
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FIGURE 6.7 Biologically motivated metaphor for the data fusion process. 

Within this metaphor, sensor data relates to the short-term knowledge, while long-term knowledge 
relates to relatively static factual and procedural knowledge. Because the goal of both biological and 
artificial situation awareness systems is the development and maintenance of the current relevant percep- 
tion of the environment, the dynamic situation description represents medium-term memory. In both 
biological and tactical data fusion systems, current emphasizes the character of the dynamically changing 
scene under observation, as well as the potentially time-evolving analysis process that could involve 
interactions among a network of distributed fusion processes. Memory limitations and the critical role 
medium-term memory plays in both biological and artificial situation awareness systems enables only 
relevant states to be maintained. Because sensor measurements are inherently information-limited, real- 
world events are often nondeterministic, and uncertainties often exist in the reasoning process, a disparity 
between perception and reality must be expected. 

As illustrated in Figure 6.7, sensor observables represent short-term declarative knowledge and the 
situation description represents medium-term declarative knowledge. Templates, filters, and the like are 
static declarative knowledge; domain knowledge includes both static (long-term) and dynamic (medium- 
and short-term) declarative context knowledge; and F represents the fusion process reasoning (long-term 
procedural) knowledge. Thus, as in biological situation awareness development, machine-based 
approaches require the interaction among short-, medium-, and long-term declarative knowledge, as 
well as long-term procedural knowledge. Medium-term knowledge tends to be highly perishable, while 
long-term declarative and procedural knowledge is both learned and forgotten much more slowly. With 
the exception of the difference in the time constants, learning of long-term knowledge and update of the 
situation description are fully analogous operations. 

In general, short-, medium-, and long-term knowledge can be either context-sensitive or context- 
insensitive. In this chapter, context is treated as a conditional dependency among objects, attributes, or 
functions (e.g., f(xj,x 2 |x 3 = a)). Thus, context represents both explicit and implicit dependencies or 
conditioning that exist as a result of the state of the current situation representation or constraints 
imposed by the domain and/or the environment. 

Short-term knowledge is dynamic, perishable, and highly context sensitive. Medium-term knowledge 
is less perishable and is learned and forgotten at a slower rate than short-term knowledge. Medium-term 
knowledge maintains the context-sensitive situation description at all levels of abstraction. The inherent 
context-sensitivity of short- and medium-term knowledge indicates that effective interpretation can be 
achieved only through consideration of the broadest possible context. 

Long-term knowledge is relatively nonperishable information that may or may not be context- 
sensitive. Context-insensitive long-term knowledge is either generic knowledge, such as terrain/ elevation, 
soil type, vegetation, waterways, cultural features, system performance characteristics, and coefficients 
of fixed-parameter signal filters, or context-free knowledge that simply ignores any domain sensitivity. 
Context-sensitive long-term knowledge is specialized knowledge, such as enemy Tables of Equipment, 
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context-conditioned rule sets, doctrinal knowledge, and special-purpose two-dimensional map overlays 
(e.g., mobility maps or field-of-view maps). The specialization of long-term knowledge can be either 
fixed ( context-specific ) or conditionally dependent on dynamic or static domain knowledge (context- 
general). 

Attempts at overcoming limitations of context-free algorithms often relied on fixed context algorithms 
that lack both generality and extensibility. The development of algorithms that are implicitly sensitive to 
relevant domain knowledge, on the other hand, tends to produce algorithms that are both more powerful 
and more extensible. Separate management of these four classes of knowledge potentially enhances 
database maintainability. 

6.3.2 Fusion Classes 

The fusion model depicted in Figure 6.3(b) views the process as the composition among (1) short-term 
declarative, (2) medium-term declarative, (3) long-term declarative, and (4) long-term procedural knowl- 
edge. Based on such a characterization, 15 distinct data fusion classes can be defined as illustrated by 
Table 6.4, representing all combinations of the four classes of knowledge. 

Fusion classes provide a simple characterization of fusion algorithms, permitting a number of straight- 
forward observations to be made. For example, only algorithms that employ short-term knowledge are 
sensitive to a dynamic input space, while only algorithms that employ medium-term knowledge are 
sensitive to the existing situation awareness product. Only algorithms that depend on long-term declar- 
ative knowledge are sensitive to static domain constraints. 

While data fusion algorithms can rely on any possible combination of short-term, medium-term, and 
long-term declarative knowledge, every algorithm employs some form of procedural knowledge. Such 
knowledge may be either explicit or implicit. Implicit procedural knowledge is implied knowledge, while 
explicit procedural knowledge is formally represented knowledge. In general, implicit procedural knowl- 
edge tends to be associated with rigid analysis paradigms (i.e., cross correlation of two signals), whereas 
explicit procedural knowledge supports more flexible and potentially more powerful reasoning forms 
(e.g., model-based reasoning). 

All fusion algorithms rely on some form of procedural knowledge; therefore, the development of a 
procedural knowledge taxonomy provides a natural basis for distinguishing approaches to machine-based 
reasoning. For our purposes, procedural knowledge will be considered to be long-term declarative knowl- 
edge and its associated control knowledge. Long-term declarative knowledge, in turn, is either specific or 
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general. Specific declarative knowledge represents fixed (static) facts, transformations, or templates, such 
as filter transfer functions, decision trees, sets of explicit relations, object attributes, exemplars, or 
univariate density functions. General declarative knowledge, on the other hand, characterizes not just 
the value of individual attributes, but the relationships among attributes. Thus, object models, produc- 
tion-rule condition sets, parametric models, joint probability density functions, and semantic constraint 
sets are examples of general long-term declarative knowledge. Consequently, specific long-term declarative 
knowledge supports relatively fixed and rigid reasoning, while general long-term declarative knowledge 
supports more flexible approaches to reasoning. 

Fusion algorithms that rely on specific long-term declarative knowledge are common when these three 
conditions all hold true: 

• The decision process has relatively few degrees of freedom (attributes, parameters, dimensions). 

• The problem attributes are relatively independent (no complex interdependencies among 
attributes). 

• Relevant reasoning knowledge is static. 

Thus, static problems characterized by moderate-sized state spaces and static domain constraints tend 
to be well served by algorithms that rely on specific long-term declarative knowledge. 

At the other end of the spectrum are problems that possess high dimensionality and complex depen- 
dencies and are inherently dynamic. For such problems, reliance on algorithms that employ specific long- 
term declarative knowledge inherently limits the robustness of their performance. While such algorithms 
might yield acceptable performance for highly constrained problem sets, their performance tends to 
degrade rapidly as conditions deviate from nominal or as the problem set is generalized. In addition, 
dependence on specific declarative knowledge often leads to computation and/or search requirements 
exponentially related to the problem size. Thus, algorithms based on general long-term declarative 
knowledge can offer significant benefits when one or more of the following hold: 

• The decision process has a relatively large number of degrees of freedom. 

• The relationships among attributes are significant (attribute dependency). 

• Reasoning is temporally sensitive. 

Control knowledge can be grouped into two broad classes: rigid and flexible. Rigid control knowledge 
is appropriate for simple, routine tasks that are static and relatively context-insensitive. The computation 
of the correlation coefficient between an input data set and a set of stored exemplar patterns is an example 
of a simple rigid control strategy. Flexible control knowledge, on the other hand, supports more complex 
strategies, such as multiple-hypothesis, opportunistic, and mixed-initiative approaches to reasoning. In 
addition to being flexible, such knowledge can be characterized as either single level-of-abstraction or 
multiple level-of-abstraction. The former implies a relatively local control strategy, while the latter supports 
more global reasoning strategies. Based on these definitions, four distinct classes of control knowledge exist: 

• Rigid, single level-of-abstraction; 

• Flexible, single level-of-abstraction; 

• Rigid, multiple level-of-abstraction; 

• Flexible, multiple level-of abstraction. 

Given the two classes of declarative knowledge and the four classes of control knowledge, there exist eight 
distinct forms of procedural knowledge. 

In general, there are two fundamental approaches to reasoning: generation-based and hypothesis-based. 
Viewing analysis as a “black box” process with only its inputs and outputs available enables a simple 
distinction to be made between the two reasoning modalities. Generation-based problem-solving 
approaches “transform” a set of input states into output states; hypothesis-based approaches begin with 
output states and hypothesize and, ultimately, validate input states. Numerous reasoning paradigms such 
as filtering, neural networks, template match approaches, and forward-chained expert systems rely on 
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TABLE 6.5 Biologically Motivated Problem-Solving Form Taxonomy 
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generation-based reasoning. Other paradigms, such as backward-chained expert systems and certain 
graph-based and model-based reasoning approaches, rely on the hypothesis-based paradigm. Hybrid 
approaches utilize both reasoning modalities. 

In terms of object-oriented reasoning, generation-based approaches tend to emphasize bottom-up 
analysis, while hypothesis-based reasoning often relies on top-down reasoning. Because both generation- 
based and hypothesis-based approaches can utilize any of the eight forms of procedural knowledge, 16 
canonical problem solving (or paradigm) forms can be defined, as shown in Table 6.5. 

Existing problem-solving taxonomies are typically constructed in a bottom-up fashion, by clustering 
similar problem-solving techniques and then grouping the clusters into more general categories. The 
categorization depicted in Table 6.5, on the other hand, being both hierarchical and complete, represents 
a true taxonomy. In addition to a convenient organizational framework, this taxonomy forms the basis 
of a “capability-based” paradigm classification scheme. 

6.3.3 Fusion Classes and Canonical Problem-Solving Forms 

Whereas a fusion class characterization categorizes the classes of data utilized by a fusion algorithm, the 
canonical problem solving form taxonomy can help characterize the potential robustness, context-sensi- 
tivity, and efficiency of a given algorithm. Thus, the two taxonomies serve different, yet fully comple- 
mentary purposes. 

6.3.3. 1 The Lower-Order Canonical Forms 

6.3.3. 1.1 Canonical Forms I and II 

Canonical forms I and II represent the simplest generation-based and hypothesis-based analysis 
approaches, respectively. Both of these canonical forms employ specific declarative knowledge and simple, 
rigid, single level-of-abstraction control. Algorithms based on these canonical form approaches generally 

• Perform rather fixed data-independent operations, 

• Support only implicit temporal reasoning (time series analysis), 

• Rely on explicit inputs, 

• Treat problems at a single level-of-abstraction. 
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Signal processing, correlation-based analysis, rigid template match, and artificial neural systems are 
typical examples of these two canonical forms. Such approaches are straightforward to implement; 
therefore, examples of these two forms abound. 

Early speech recognition systems employed relatively simple canonical form I class algorithms. In these 
approaches, an audio waveform of individual spoken words was correlated with a set of prestored 
exemplars of all words in the recognition system’s vocabulary. The exemplar achieving the highest 
correlation above some threshold was declared the most likely candidate. Because the exemplars were 
obtained during a training phase from the individual used to test its performance, these systems were 
highly speaker-dependent. The algorithm clearly relied on specific declarative knowledge (specific exem- 
plars) and rigid, single level-of-abstraction control (exhaustive correlation followed by rank ordering of 
candidates). Although easy to implement and adequate in certain idealized environments (speaker- 
dependent, high signal-to-noise ratio, nonconnected word-speech applications), the associated exhaustive 
generation-and-test operation made the approach too inefficient for large vocabulary systems, and too 
brittle for noisy, speaker-independent, and connected-speech applications. 

Although artificial neural systems are motivated by their biological counterpart, current capabilities 
of undifferentiated artificial neural systems (ANS) generally fall short of the performance of even simple 
biological organisms. Whereas humans are capable of complex, context-sensitive, multiple level-of- 
abstraction reasoning based on robust world models, ANS effectively filter or classify a set of input states. 
While humans can learn as they perform tasks, the ANS weight matrix is typically frozen (except in 
certain forms of clustering) during the state-transition process. 

Regardless of the type of training, the nature of the nonlinearity imposed by the algorithm, or the 
specific details of the connection network, pretrained ANS represent static, specific long-term declarative 
knowledge; the associated control element is clearly static, rigid, and single level-of-abstraction. Most 
neural networks are used in generation-based processing applications and therefore possess all the key 
characteristics of all canonical form I problem-solving forms. Typical of canonical form I approaches, 
neural network performance tends to be brittle for problems of general complexity (because they are not 
model based) and non-context-sensitive (because they rely on either a context-free or highly context- 
specific weight matrix). Widely claimed properties of neural networks, such as robustness and ability to 
generalize, tend to be dependent on the data set and on the nature and extent of data set preprocessing. 

Although the computational requirements of most canonical form I problem-solving approaches 
increase dramatically with problem complexity, artificial neural systems can be implemented using high 
concurrency hardware realizations to effectively overcome this limitation. Performance issues are not 
necessarily eliminated, however, because before committing a network to hardware (and during any 
evolutionary enhancements), extensive retraining and testing may be required. 

6.3.3. 1.2 Canonical Forms III-VIII 

Canonical form III and IV algorithms utilize specific declarative knowledge and rigid, multiple level-of- 
abstraction control knowledge. Although such algorithms possess most of the limitations of the lowest order 
problem solving approaches, canonical form III and IV algorithms, by virtue of their support to multiple 
level-of-abstraction control, tend to be somewhat more efficient than canonical forms I and II. Simple 
recursive, multiple resolution, scale-space, and relaxation-based algorithms are examples of these forms. 

As with the previous four problem-solving forms, canonical form V and VI algorithms rely on specific 
declarative knowledge. However, rather than rigid control, these algorithms possess a flexible, single level- 
of-abstraction control element that can support multiple hypothesis approaches, dynamic reasoning, and 
limited context-sensitivity. 

Canonical form VII and VIII approaches employ specific declarative and flexible, multiple level-of- 
abstraction control knowledge. Although fundamentally non-model-based reasoning forms, these forms 
support flexible, mixed top-down/bottom-up reasoning. 

6.3.3. 2 The Higher-Order Canonical Forms 

As a result of their reliance on specific declarative knowledge, the eight lower-order canonical form approaches 
represent the core of most numeric-based approaches to reasoning. In general, these lower-order form 
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approaches are unable to effectively mimic the high-level semantic and cognitive processes employed by 
human decision makers. The eight higher-level canonical forms, on the other hand, provide significantly 
better support to semantic and symbolic-based reasoning. 

6.3. 3.2.1 Canonical Forms IX and X 

Canonical forms IX and X rely on general declarative knowledge and rigid, single level-of-abstraction 
control, representing simple model-based transformation and model-based constraint set evaluation 
approaches, respectively. General declarative knowledge supports more dynamic and more context- 
sensitive reasoning than specific declarative knowledge. However, because these two canonical forms rely 
on rigid, single level-of-abstraction control, canonical form IX and X algorithms tend to be inefficient. 

The motivation behind expert system development was to emulate the human reasoning process in a 
restricted problem domain. An expert system rule-set generally contains both formal knowledge (e.g., 
physical laws and relationships), as well as heuristics and “rules-of-thumb” gleaned from practical expe- 
rience. Although expert systems can accommodate rather general rule condition and action sets, the 
associated control structure is typically quite rigid (i.e., sequential condition set evaluation, followed by 
straightforward resolution of which instantiated rules should be allowed to fire). In fact, the separation 
of procedural knowledge into modular IF/THEN rule-sets (general declarative knowledge) that are 
evaluated using a rigid, single level-of-abstraction control structure (rigid control knowledge) represents 
the hallmark of the pure production-rule paradigm. Thus, demanding rule modularity and a uniform 
control structure effectively relegates conventional expert system approaches to the two lowest-order, 
model-based, problem-solving forms. 

6.33.2.2 Canonical Forms XI through XIV 

Problem solving associated with canonical forms XI and XII relies on a general declarative element and 
rigid, multiple level-of-abstraction control. Consequently, these forms support both top-down and bot- 
tom-up reasoning. Production rule paradigms that utilize a hierarchical rule-set are an example of such 
an approach. 

Canonical forms XIII and XIV employ procedural knowledge that possesses a general declarative 
element and flexible, single level-of-abstraction control. As a result, these canonical forms can support 
sophisticated single level-of-abstraction, model-based reasoning. 

6.3. 3.2.3 Canonical Forms XV and XVI 

Canonical form XV and XVI paradigms employ general declarative knowledge and flexible, multiple 
level-of-abstraction control; therefore, they represent the most powerful generation-based and hypoth- 
esis-based problem-solving forms, respectively. Although few canonical form XV and XVI fusion algo- 
rithms have achieved operational status, efficient algorithms that perform sophisticated, model-based 
reasoning, while meeting rather global optimality criteria, can be reasonably straightforward to develop. 1 

The HEARSAY speech understanding system 2 was an early attempt at building a higher-order reasoning 
system. This system, developed in the early 1980s, treated speech recognition as both inherently context- 
sensitive and multiple level-of-abstraction. HEARSAY employed a hierarchy of models appropriate at the 
various levels-of-abstraction within the problem domain, from signal processing to perform formant 
tracking and spectral analysis for phoneme extraction, to symbolic reasoning for meaning extraction. 
Higher-level processes, with their broader perspective and higher-level knowledge, provided some level 
of control over the lower-level processes. Importantly, HEARSAY viewed speech understanding in a 
holistic fashion with each level of the processing hierarchy treated as a critical component of the fully 
integrated analysis process. 

6333 Characteristics of the Higher-Order Canonical Forms 

Five key algorithm issues have surfaced during the preceding discussion: 

• Robustness 

• Context-sensitivity 

• Extensibility 
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• Maintainability 

• Efficiency 

Each of these issues is discussed briefly below. 

6.3.3.3.1 Robustness 

Robustness measures the fragility of a problem-solving approach to changes in the input space. Algorithm 
robustness depends, quite naturally, on both the quality and efficacy of the models employed. The 
development of an “adequate” model depends, in turn, on the complexity of the process being modeled. 
A problem that intrinsically exhibits few critical degrees of freedom would logically require a simpler 
model than one that possesses many highly correlated features. 

As a simple illustration, consider the handwritten character recognition problem. Although handwrit- 
ten characters possess a large number of degrees-of-freedom (e.g., line thickness, character orientation, 
style, location, size, color, darkness, and contrast ratio), a simple model can capture the salient attributes 
of the character “H” (i.e., two parallel lines connected at their approximate centers by a third line 
segment). Thus, although the handwritten character intrinsically possesses many degrees-of-freedom, 
most are not relevant for distinguishing the letter “H” from other handwritten characters. Conversely, 
in a non-model-based approach, each character must be compared with a complete set of exemplar 
patterns for all possible characters. Viewed from this perspective, a non-model-based approach can 
require consideration of all combinations of both relevant and nonrelevant problem attributes. 

6.3. 3.3.2 Context Sensitivity 

Context refers to both the static domain constraints (natural and cultural features, physical laws) and 
dynamic domain constraints (current location of all air defense batteries) relevant to the problem-solving 
process. Dynamic short-term and medium-term knowledge are generally context-sensitive, while a priori 
long-term reasoning knowledge may or may not be sensitive to context. 

Context-sensitive long-term knowledge (both declarative and procedural) is conditional knowledge 
that must be specialized by static or dynamic domain knowledge (e.g., mobility map or current dynamic 
Order of Battle). Context-insensitive knowledge is generic, absolute, relatively immutable knowledge that 
is effectively domain independent (e.g., terrain obscuring radar coverage or wide rivers acting as obstacles 
to ground-based vehicles). Such knowledge is fundamentally unaffected by the underlying context. 
Context-specific knowledge is long-term knowledge that has been specialized for a given, fixed context. 
Context-free knowledge simply ignores any effects related to the underlying context. 

In summary, context-sensitivity is a measure of a problem’s dependency on implicit domain knowledge 
and constraints. As such, canonical forms I-IV are most appropriate for tasks that require either con text- 
insensitive or context-specific knowledge. Because canonical forms V-VIII possess flexible control, all are 
potentially sensitive to problem context. General declarative knowledge can be sensitive to non-sensor- 
derived domain knowledge (e.g., a mobility map, the weather, the current ambient light level, or the 
distance to the nearest river); therefore, all higher order canonical forms are potentially context-sensitive. 
Canonical forms XIII-XVI support both context-sensitive declarative and context-sensitive control knowl- 
edge and, therefore, are the only fully context-sensitive problem-solving forms. 

6.3. 3.3.3 Extensibility and Maintainability 

Extensibility and maintainability are two closely related concepts. Extensibility measures the “degree of 
difficulty” of extending the knowledge base to accommodate domain changes or to support related 
applications. Maintainability measures the “cost” of storing and updating knowledge. Because canonical 
forms I- VIII rely on a specific declarative knowledge, significant modifications to the algorithm can be 
required for even relatively minor domain changes. Alternatively, because they employ general declarative 
knowledge, canonical forms IX-XVI tend to be much more extensible. 

The domain sensitivity of the various canonical form approaches varies considerably. The lower-order 
canonical form paradigms typically rely on context-free and context-specific knowledge, leading to 
relatively nonextensible algorithms. Because context-specific knowledge may be of little value when the 
problem context changes (e.g., a mobility map that is based on dry conditions cannot be used to support 
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FIGURE 6.8 General characteristics of the sixteen canonical fusion forms and associated problem-solving paradigms. 



analysis during a period of flooding), canonical form I-IV approaches tend to exhibit brittle performance 
as the problem context changes. Attempting to support context-sensitive reasoning using context-specific 
knowledge can lead to significant database maintainability problems. 

Conversely, context-insensitive knowledge (e.g., road, bridge, or terrain-elevation databases) is unaf- 
fected by context changes. Context-insensitive knowledge remains valid when the context changes; how- 
ever, context-sensitive knowledge may need to be redeveloped. Therefore, database maintainability 
benefits from the separation of these two knowledge bases. Algorithm extensibility is enhanced by model- 
based approaches and knowledge base maintainability is enhanced by the logical separation of context- 
sensitive and context-insensitive knowledge. 

6.3. 3.3.4 Efficiency 

Algorithm efficiency measures the relative performance of algorithms with respect to computational and/or 
search requirements. Although exceptions exist, for complex, real-world problem solving, the following 
generalizations often apply: 

• Model-based reasoning tends to be more efficient than non-model-based reasoning. 

• Multiple level-of-abstraction reasoning tends to be more efficient than single level-of-abstraction 
reasoning. 

The general characteristics of the 16 canonical forms are summarized in Figure 6.8. 



6.4 Observations 



This chapter concludes with five general observations pertaining to data fusion automation. 



6.4.1 Observation 1 

Attempts to automate many complex, real-world fusion tasks face a considerable challenge. One obvious 
explanation relates to the disparity between manual and algorithmic approaches to data fusion. For 
example, humans 
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• Are adept at model-based reasoning (which supports robustness and extensibility), 

• Naturally employ domain knowledge to augment formally supplied information (which supports 
context-sensitivity), 

• Update or modify existing beliefs to accommodate new information as it becomes available (which 
supports dynamic reasoning), 

• Intuitively differentiate between context-sensitive and context-insensitive knowledge (which sup- 
ports maintainability), 

• Control the analysis process in a highly focused, often top-down fashion (which enhances efficiency). 

As a consequence, manual approaches to data fusion tend to be inherently dynamic, robust, context- 
sensitive, and efficient. Conversely, traditional paradigms used to implement data fusion algorithms have 
tended to be inherently static, nonrobust, non-context-sensitive, and inefficient. Many data fusion prob- 
lems exhibit complex, and possibly dynamic, dependencies among relevant features, advocating the 
practice of 

• Relying more on the higher order problem solving forms, 

• Applying a broader range of supporting databases and reasoning knowledge, 

• Utilizing more powerful, global control strategies. 

6.4.2 Observation 2 

Although global phenomena naturally require global analysis, local phenomena can benefit from both a 
local and a global analysis perspective. As a simple example, consider the target track assignment process 
typically treated as a strictly local analysis task. With a conventional canonical form I approach to target 
tracking, track assignment is based on recent, highly local behavior (often assuming a Markoff process). 
For ground-based objects, a vehicle’s historical trajectory and its maximum performance capabilities 
provide rather weak constraints on future target motion. A “road-constrained target extrapolation strat- 
egy,” for example, provides much stronger constraints on ground- vehicle motion than a purely statistical- 
based approach. As a result, the latter tends to generate highly under-constrained solutions. 

Although applying nearby domain constraints could adequately explain the local behavior of an object 
(e.g., constant velocity travel along a relatively straight, level road), a more global viewpoint is required 
to interpret global behavior. Figure 6.9 demonstrates local (i.e., concealment, minimum terrain gradient, 
and road seeking), medium-level (i.e., river-crossing and road-following), and global (i.e., reinforce at 
unit) interpretations of a target’s trajectory over space and time. The development and maintenance of 
such a multiple level-of-abstraction perspective is a critical underlying requirement for automating the 
situation awareness development process. 

6.4.3 Observation 3 

Production systems have historically performed better against static, well-behaved, finite-state diagnostic- 
like problems than against problems that possess complex dependencies and exhibit dynamic, time- 
varying behavior. These shortcomings occur because such systems rely on rigid, single level-of-abstraction 
control that is often insensitive to domain context. Despite this fact, during the early 1990s, expert systems 
were routinely applied to dynamic, highly context-sensitive problem domains, often with disappointing 
results. 

The lesson to be learned is that both the strengths and limitations of a selected problem-solving 
paradigm must be fully understood by the algorithm developer from the outset. When an appropriately 
constrained task was successfully automated using an expert system approach, developers often found 
that the now well-understood problem could be more efficiently implemented using another paradigm. 
In such cases, better results were obtained by using either an alternative canonical form IX or X problem- 
solving approach or a lower-order, non-model-based approach. 
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FIGURE 6.9 Multiple level-of-abstraction situation understanding. 

When an expert system proved to be inadequate for handling a given problem, artificial neural systems 
were often seen as an alternative or preferred approach. Neural networks require no programming; 
therefore, the paradigm appeared ideal for handling ill-defined or poorly understood problems. While 
expert systems could have real-time performance problems, artificial neural systems promised high 
performance hardware implementations. In addition, the adaptive nature of the neural net learning 
process often seemed to match real-world, dynamically evolving problem-solving requirements. However, 
most artificial neural systems operate more like a statistical or fuzzy pattern recognizer than as a sophis- 
ticated reasoning system capable of generalization, reasoning by analogy, and abstract inference. As 
indicated by the reasoning class taxonomy, while expert systems represent a lower-order model-based 
reasoning approach, a neural network represents the lowest-order non-model-based reasoning approach. 

6.4.4 Observation 4 

Radar systems typically employ a single statistical-based algorithm for tracking air targets, regardless of 
whether an aircraft is flying at an altitude of 20 kilometers or just above tree-top level. Likewise, such 
algorithms are generally insensitive as to whether the target is a high performance fighter aircraft or a 
relatively low speed helicopter. Suppose a nonfriendly high-performance reconnaissance aircraft is flying 
just above a river as it snakes through a mountainous region. There exist a wide range of problems 
associated with tracking such a target, including dealing with high clutter return, terrain masking, and 
multipath effects. In addition, an airborne radar system may have difficulty tracking the target as a result 
of high acceleration turns associated with an aircraft following a highly irregular surface feature. The 
inevitable track loss and subsequent track fragmentation errors typically would require intervention by 
a radar analyst. Tracking helicopters can be equally problematic. Although they fly more slowly, such 
targets can hover, fly below tree-top level, and execute rapid directional changes. 

Tracking performance can potentially be improved by making the tracking analysis sensitive to target 
class-specific behavior, as well as to constraints posed by the domain. For example, the recognition that 
the aircraft is flying just above the terrain suggests that surface features are likely to influence the target’s 
trajectory. When evaluated with respect to “terrain feature-following models,” the trajectory would be 
discovered to be highly consistent with a “river- following flight path.” Rather than relying on past behavior 
to predict future target positions, a tracking algorithm could anticipate that the target is likely to continue 
to follow the river. 
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In addition to potentially improving tracking performance, the interpretation of sensor-derived data 
within context also permits more abstract interpretations. If the aircraft were attempting to avoid radar 
detection by one or more nearby surface-to-air missile batteries, a nap of the earth flight profile could 
indicate hostile intent. Even more global interpretations can be hypothesized. Suppose a broader view 
of the “situation picture” reveals another unidentified aircraft operating in the vicinity of the river- 
following target. By evaluating the apparent coordination between the two aircraft, the organization and 
mission of the target group can be conjectured. For example, if the second aircraft begins jamming 
friendly communication channels just as the first aircraft reaches friendly airspace, the second aircraft’s 
role can be inferred to be “standoff protection for the primary collection or weapon delivery aircraft.” 
The effective utilization of relevant domain knowledge and physical domain constraints offers the poten- 
tial for developing both more effective and higher level-of-abstraction interpretations of sensor-derived 
information. 

6.4.5 Observation 5 

Indications and warnings, as well as many other forms of expectation-based analysis have traditionally 
relied on relatively rigid doctrinal and tactical knowledge. However, contemporary data fusion applica- 
tions often must support intelligence applications where flexible, ill-defined, and highly creative tactics 
and doctrine are employed. Consequently, the credibility of any analysis that relies on rigid expectation- 
based behavior needs to be carefully scrutinized. Although the lack of strong, reliable a priori knowledge 
handicaps all forms of expectation-based reasoning, the use of relevant logical, physical, and logistical 
context at least partially compensates for the lack of more traditional problem domain constraints. 
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7.1 Introduction 



A broad consensus holds that a probabilistic approach to evidence accumulation is appropriate because 
it enjoys a powerful theoretical foundation and proven guiding principles. Nevertheless, many would 
argue that probability theory is not suitable for practical implementation on complex real-world prob- 
lems. Further debate arises when considering people’s subjective opinions regarding events of interest. 
Such debate has resulted in the development of several alternative approaches to combining evidence. 1 * 3 
Two of these alternatives, possibility theory (or fuzzy logic) 4 * 6 and belief theory (or Dempster- Shafer 
theory), 7 * 10 have each achieved a level of maturity and a measure of success to warrant their comparison 
with the historically older probability theory. 

This chapter first provides some background on each of the three approaches to combining evidence 
in order to establish notation and to collect summary results about the approaches. Then an example 
system that accumulates evidence about the identity of an aircraft target is introduced. The three methods 
of combining evidence are applied to the example system, and the results are contrasted. At this point, 
possibility theory is dropped from further consideration in the rest of the chapter because it does not 
seem well suited to the sequential combination of information that the example system requires. Finally, 
an example data fusion system is constructed that determines the presence and location of mobile missile 
batteries. The evidence is derived from multiple sensors and is introduced into the system in temporal 
sequence, and a software component approach is adopted for its implementation. Probability and belief 
theories are contrasted within the context of the example system. 

One key idea that emerges for simplifying the solution of complex, real-world problems involves 
collections of spaces. This is in contradistinction to collections of events in a common space. Although 
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the spaces are all related to each other, considering each space individually proves clearer and more 
manageable. The relationships among the spaces become explicit by considering some as fundamental 
representations of what is known about the physical setting of a problem, and others as arising from 
observation processes defined at various knowledge levels. 

The data and processes employed in the example system can be encapsulated in a component-based 
approach to software design, regardless of the method adopted to combine evidence. This leads naturally 
to an implementation within a modern distributed processing environment. 

Contrasts and conclusions are stated in Section 7.4. 

7.2 Alternative Approaches to Combine Evidence 

Probability is much more than simply a relative frequency. Rather, there is an axiomatic definition 11 of 
probability that places it in the general setting of measure theory. As a particular measure, it has been crafted 
to possess certain properties that make it useful as the basis for modeling the occurrence of events in various 
real-world settings. Some critics (fuzzy logicians among them) have asserted that probability theory is too 
weak to include graded membership in a set; others have asserted that probability cannot handle non- 
monotonic logic. In this chapter, both of these assertions are demonstrated by example to be unfounded. 
This leads to the conclusion that fuzzy logic and probability theory have much in common, and that they 
differ primarily in their methods for dealing with unions and intersections of events (characterized as sets). 
Other critics have asserted that probability theory cannot account for imprecise, incomplete, or inconsistent 
information. Evidence is reviewed in this chapter to show that interval probabilities can deal with imprecise 
and incomplete information in a natural way that explicitly keeps track of what is known and what is not 
known. The collection of spaces concept (developed in Section 7.3) provides an explicit means that can be 
used with any of the approaches to combine evidence to address the inconsistencies. 

7.2.1 The Probability Theory Approach 

The definition of a probability space tells what properties an assignment of probabilities must possess, 
but it does not indicate what assignment should be made in a specific setting. The specific assignment 
must come from our understanding of the physical situation being modeled, as shown in Figure 7.1. The 
definition tells us how to construct probabilities for events that are mutually exclusive (i.e., their set 




FIGURE 7.1 The comparison of predictions with measurements places probability models on firm scientific 
ground. 
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representations are disjoint). Generally speaking, when collections of events are not mutually exclusive, 
a new collection of mutually exclusive events (i.e., disjoint sets) must first be constructed. 

Consider the desirable properties for measuring the plausibility of statements about some specific 
experimental setting. Given that 

1 . The degree of plausibility can be expressed by a real number, 

2. The extremes of the plausibility scale must be compatible with the truth values of logic, 

3. An infinitesimal increase in the plausibility of statement A implies an infinitesimal decrease in the 
plausibility of the statement not-A, 

4. The plausibility of a statement must be independent of the order in which the terms of the 
statement are evaluated, 

5. All available evidence must be used to evaluate plausibility, and 

6. Equivalent statements must have the same plausibility, 

then the definition of a probability space follows as a logical consequence. 12 Further, the definition implies 
that the probability measure has properties (1) through (6). Hence, any formalism for measuring the 
plausibility of statements must necessarily be equivalent to the probability measure, or it must abandon 
one or more of the properties listed. 

7. 2. 1.1 Apparent Paradoxes and the Failure of Intuition 

Some apparent paradoxes about probability theory reappear from time to time in various forms. Two 
will be discussed — Bertrand’s paradox and Hughes’ paradox. A dice game that cannot be lost is then 
described. This will help to make the point that human intuition can fail with regard to the outcome of 
probability-space models. A failure of intuition is probably the underlying reason for the frequent 
underestimation of the power of the theory. 

7.2.1. 1.1 Bertrand’s Paradox 

Bertrand’s paradox 13 begins by imagining that lines are drawn at random to intersect a circle to form 
chords. Suppose that the coordinates of the center of the circle and the circle’s radius are known. The 
length of each chord can then be determined from the coordinates of the midpoint of the chord, which 
might be assumed to be uniformly distributed within the circle. The length of each chord can also be 
determined from the distance from the center of the chord to the center of the circle, which might be 
assumed to be uniformly distributed between zero and the radius of the circle. The length of each chord 
can also be determined from the angle subtended by the chord, which might be assumed to be uniformly 
distributed between 0 and 180 degrees. The length of each chord is certainly the same, regardless of the 
method used to compute it. 

Bertrand asked, “What is the probability that the length of a chord will be longer than the side of an 
inscribed equilateral triangle?” Three different answers to the question appear possible depending on 
which of the three assumptions is made. How can that be if the lengths must be the same? A little reflection 
reveals that the lengths may indeed be the same when determined by each method, but that assumptions 
have been made about three different related quantities, none of which is directly the length. In fact, the 
three quantities cannot simultaneously be distributed in the same way. Which one is correct? Jaynes 14 
has shown that only the assumption that chord centers are uniformly distributed within the circle provides 
an answer that is invariant under infinitesimal translations and rotations. 

Bertrand’s paradox touches on the principle of indifference: if no reason exists for believing that any 
one of n mutually exclusive events is more likely than any other, a probability of 1/n is assigned to each 
event. This is a valid principle, but it must be applied with caution to avoid pitfalls. Suppose, for instance, 
four cards — two black and two red — are shuffled and placed face down on a table. Two cards are 
picked at random. What is the probability they are the same color? One person reasons, “They are either 
both black, or they are both red, or they are different; in two cases the colors are the same, so the answer 
is A second person reasons, “No, the cards are either the same or they are different; the answer is l h” 
They are both wrong, as shown in Figure 7.2. There is simply no substitute for careful analysis. 
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FIGURE 7.2 No matter which two cards one picks, P(same color) = 1/3. 

7.2. 1.1.2 Hughes’ Paradox 

The Hughes paradox arose in the context of pattern recognition studies during the late 1960s and early 
1970s. Patterns were characterized as vectors, and rules to decide a pattern’s class membership were 
studied using a collection of samples of the patterns. The collection size was held constant. The perfor- 
mance of a decision rule was observed experimentally to often improve as the dimension of the pattern 
vectors increased — up to a point. The performance of the decision rule decreased beyond that point. 
This led some investigators to conclude that there was an optimal dimension for pattern vectors. However, 
most researchers believed that the performance of a Bayes-optimal classifier never decreases as the 
dimension of the pattern vectors increases. This can be attributed to the fact that a Bayes-optimal decision 
rule, if given irrelevant information, will just throw the information away. (See, for example, the “theorem 
of irrelevance.” 15 ). The confusion was compounded by the publication of Hughes’ paper, 16 which seemed 
to prove that an optimal dimension existed for a Bayes classifier. As a basis for his proof, Hughes 
constructed a monotonic sequence of data quantizers that provided the Bayes classifier with a finer 
quantization of the data at each step. Thus, the classifier dealt with more data at each step of the sequence. 
Hughes thought that he had constructed a sequence of events in a common probability space. However, 
he had not; he had constructed a sequence of probability spaces. 17 Because the probability-space definition 
was changing at each step of the sequence, the performance of a Bayes classifier in one space was not 
simply related to the performance of a Bayes classifier in another space. There was no reason to expect 
that the performances would be monotonically related in the same manner as the sequence of classifiers. 
This experience sheds light on how to construct rules to accumulate evidence in data fusion systems: 
accumulating evidence can change the underlying probability-space model in subtle ways for which 
researchers must account. 

7.2.1. 1.3 A Game That Can’t Be Lost 

This next example demonstrates that people do not have well-developed intuition about what can happen 
in probability spaces. Given the four nonstandard, fair, six-sided dice shown in Figure 7.3, play the 
following game. First, pick one of the dice. Then have someone else pick one of the remaining three. 
Both of you roll the die that you have selected; the one with the highest number face up wins. You have 
the advantage, right? Wrong! No matter which die you pick, one of the remaining three will win at this 
game two times out of three. Call the dice A, B, C, and D. A beats B with probability 2/3, B beats C with 
probability 2/3, C beats D with probability 2/3, and D beats A with probability 2/3 — much like the 
childhood game rock-scissors-paper, this game involves nontransitive relationships. People typically think 
about “greater than” as inducing a transitive relation among ordinary numbers. Their intuition fails when 
operating in a domain with nontransitive relations. 18 In this sense, probability-space models can deal 
with non-monotonic logic. 
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FIGURE 7.3 These dice (reported in 1970 by Martin Gardner to be designed by Bradley Efron at Stanford Univer- 
sity) form the basis of a game one cannot lose. 

The point of this section is to emphasize that the physical situation at hand is critically important. 
Considerable work may be required to construct an accurate probability model, but the effort can be 
very rewarding. The power of probability theory is that it tells us how to organize and quantify what is 
known in terms that lead to minimizing the expected cost of making decisions. 

7.2. 1.2 Observation Processes and Random Variables 

In many physical settings, an item of interest cannot be directly accessed. Instead, it can only be indirectly 
observed. For example, the receiver in a communication system must observe a noise-corrupted modu- 
lated signal for some interval of time to decide which message was sent. Based on the sampling theorem, 
a received signal can be characterized completely by a vector of its samples taken at an appropriate rate. 
The sample vectors are random vectors; their components are joint random variables. The random 
variables of interest arise from some well-defined observation processes implemented as modules in a 
data fusion system. It is important to be precise about random variables that can characterize observation 
processes. 

Formally, a random variable is a measurable function defined on a sample space (e.g., {f:S— > R) or 
(f:S— > R n ), indicating scalar or vector random variables taking values on the real line or its extension 
to n-dimensional space). The probability distribution on the random variable is induced by assigning to 
each subset of R(R n ), termed events, the same probability as the subset of S that corresponds to the inverse 
mapping from the event-subset to S. This is the formal definition of a measurable function. In Figure 7.4, 
the event, B, occurs when the random variable takes on values in the indicated interval on the real line. 
The image of B under the inverse mapping is a subset of S, called B'. This results in P(B) = P(B'), even 
though B and B'are in different spaces. 

The meaning of this notation when observation processes are involved should be emphasized. If the 
set, A, in £2 represents an event defined on the sample space, and if the set, B, in R represents an event 
defined on the real line through a random variable, then one set must be mapped into a common space 
with the other. This enables a meaningful discussion about the set {/(A) & B}, or about the set {A &/~' (B) }. 
The joint events [A & B] can similarly be discussed, taking into consideration the meaning in terms of the 
set representations of those events. In other words, P[A &B] = P[\f (A) &B}] = P[{A (B)}]. Note 
that even when a collection of sets, A ; , for i = 1,2, . . partitions some original sample space, the images 
of those sets under the observation mapping, /(A ; ), will not, in general, partition the new sample space. 
In this way, probability theory clearly accommodates concepts of measurement vectors belonging to a 
set (representing a cause) with graded membership. 
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FIGURE 7.4 (a) Forward mappings and (b) inverse mappings relate the sample space to an observation space. 



7. 2. 1.3 Bayes' Theorem 

There may be modules in a data fusion system that observe the values of random variables (or vectors) 
and compute the probability that the observed values have some particular cause. The causes partition 
their sample space. Bayes’ theorem is employed to compute the probability of each possible cause, given 
some observation event. Suppose Aj,A 2 ,...A„ form a collection of subsets of S (representing causes) that 
partition S. Then for any observation event, B, with P(B) > 0, 




(7.1) 



and 



P H = L P H#( A .) (7 ' 2) 

!= 1 

The quantities P(A,|B) and P(B|A ; ) are termed conditional probabilities; the quantities P(A ; ) and P(B) 
are termed marginal probabilities. The quantities P(B|A ; ) and P(A ; ) are termed a priori probabilities because 
they represent statements that can be made prior to knowing the value of any observation. Again, note 
that Bayes theorem remains true for events represented by elements of £2, as well as for random events 
defined through an observation process. This can cause some confusion. The original sample space and 
the observation space are clearly related, but they are separate probability spaces. Knowing which space 
you are operating in is important. 

Note that Bayes’ theorem assumes that some event is given (i.e., it has unequivocally occurred). Often 
this is not the case in a data fusion system. Suppose, for example, that an event, E, is observed with 
confidence 0.9. This could be interpreted to mean that E has occurred with probability 0.9, and that its 
alternatives occur with a combined probability of 0.1. Assuming two alternatives, A1 and A2, interval 
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probabilities can be employed to conclude that E occurred with probability 0.9, A1 occurred with 
probability x, and A2 occurred with probability 0.1 - x, where 0 < x < 0.1. Finally, assuming that one of 
the possible causes of the observed events is C, and noting that a true conditioning event does not yet 
exist, a superposition of probability states can be defined. Thus, combining the results from using Bayes’ 
theorem on each of the possible observed events and weighting them together gives 

p(c) = (0.9) * p(c|e) + (x) * p(c|Al) + (0. 1 - x) * P(CA2} (7.3) 



where 0 < x < 0.1. 

This particular form of motivation for the resulting probability interval does not seem to appear in 
the substantial literature on interval probabilities. Yet, it has a nice physical feel to it. To compensate for 
the uncertainty of not knowing the current state, an interval probability is created from a superposition 
of possible event states. As addressed in the next section of this chapter, enough evidence may later be 
accumulated to “pop” the superposition and declare with acceptable risk that a true conditioning event 
has occurred. 

7.2. 1.4 Bayes-Optimal Data Fusion 

The term Bayes-optimal means minimizing risk, where risk is defined to be the expected cost associated 
with decision-making. There are costs associated with correct decisions, as well as with incorrect decisions, 
and typically some types of errors are more costly than others. Those who must live with the decisions 
made by the system must decide the cost structure associated with any particular problem. Once decided, 
the cost structure influences the optimal design through an equation that defines expected cost. To 
simplify notation, just the binary-hypotheses case will be presented; the extension to multiple-hypotheses 
is straightforward. 

Suppose there are just two underlying causes of some observations, Cl or C2. Then there are four 
elements to the cost structure: 

1. C n , the cost of deciding Cl when really Cl (a correct decision); 

2. C 22 , the cost of deciding C2 when really C2 (another correct decision); 

3. C 21 , the cost of deciding C2 when really Cl (an error; sometimes a miss); and 

4. C 12 , the cost of deciding Cl when really C2 (an error; sometimes a false alarm). 

The expected cost is simply Risk = P{cost} = C n P n + C 22 P 22 + C 21 P 2l + C 12 P 12 , where the indicated 
probabilities have the obvious meaning. Suppose the observation process produces a measurement vector, 
X, and define two regions in the associated vector space: R! = {Ajdecide Cl}, and R, = {X|decide C2}. 
Let p(X | Cl) denote the conditional probability density function of a specific value of the measurement 
vector given Cl. Let p(X\C2) denote the conditional probability density function of a specific value of 
the measurement vector given C2. Let p(X) denote the marginal probability density function on the 
measurement vector. Then, as shown elsewhere, 19 minimize risk by forming the likelihood ratio and 
comparing it to a threshold: 

Decide Cl if 



r{xa) 


1 

s 


1^12 C 22 ] 


l/ J (C2) 


; p XC2 
r ) 


1 


(c 21 -qj 


ip(ci) 



(7.4) 



otherwise, decide C2. Because applying the same monotonic function to both sides preserves the ine- 
quality, an equivalent test is (for example) to decide Cl if 
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> log 



(7.5) 



(C, 2 — C 22 )P(C2) 

(c 21 -c„)p(ci) 

and decide C2 otherwise. 

An equivalent test that minimizes risk is realized by comparing d(X) = (C 21 - C 22 )P(C 2 )p(X|C 2 ) - (C 12 - 
to 0. That is, decide Cl if d(X) < 0, and decide C2 otherwise. In some literature, 20,21 
d(X) is called a discriminant function; it has been used together with nonparametric estimators (e.g., 
potential functions or Parzen estimators) of the conditional probabilities as the basis for pattern recog- 
nition systems, including neural networks. 

An important property of this test in any of its equivalent forms is its ability to optimally combine 
prior information with measurement information. It is perhaps most obvious in the likelihood ratio 
form that relative probability is what is important — how much greater p(X|Cl) is than p(X\C2) — 
rather than the specific values of the two conditional probabilities. When one is sufficiently greater than 
the other, there may be acceptable risk in “popping” a superposition of probability states to declare a 
true conditioning event has occurred. Finally, note that in any version of the test, knowledge of the form 
of the optimal decision rule is a focal point and guide to understanding a particular problem domain. 

7.2. 1.5 Exploiting Lattice Structure 

Many researchers likely under-appreciate the fact that the lattice structure induced by the event relation- 
ships within a probability space can be exploited to determine the probability of events, perhaps in 
interval form, from partial information about some of the probabilities. To be precise, consider S = 
{xj,x 2 ,...,x N } to be an exhaustive collection of N mutually exclusive (simple, or atomic) events. The set 
2 s is the set of all possible subsets of S. Suppose unnormalized probabilities (e.g., as odds) are assigned 
to M events in 2 s , say E k for k = 1,2 ,...,M, where M may be less than, equal to, or greater than N. The 
next section of this chapter partially addresses the question: under what conditions can the probabilities 
of Xj be inferred? 

7.2.1. 5.1 A Characteristic Matrix 

Consider only the case M = N. Define C, an N x N matrix with elements c kj = 1 if {xj} a E t , and 0 
otherwise. C can be called the characteristic matrix for the E k s. Also, define P, an N x 1 vector with 
elements p k ( k = 1,2 that are the assigned unnormalized probabilities of E k . From the rule for 
combining probabilities of mutually exclusive events, P = C X, where X is an N x 1 vector with elements 
P[{x ; }], some or all of which are unknown. Clearly, X = C _1 P. For this last equation to be solvable, the 
determinant of C must be nonzero, which means the rows/columns of C are linearly independent. Put 
another way, the collection {E k I k=l,2,...,N} must “span” the simple events. 

7.2.1. 5. 2 Applicability 

The characteristic matrix defined above provides a mathematically sound, intuitively clear method of 
determining the probabilities of simple events from the probabilities of compound events derived from 
them, including combining evidence across knowledge sources. Bayes’ theorem is not used to obtain any 
of these results, and the question of how to assign prior probabilities does not arise. The lattice structure 
implicit in the definition of probabilities is simply exploited. This concept and its use are discussed later 
in this chapter, where methods of combining evidence are considered. 

7.2.2 The Possibility Theory Approach 

Possibility theory considers a body of knowledge represented as subsets of some established reference 
set, S. (Most often in literature on possibility theory the domain of discourse is denoted £2. This discussion 
uses S to minimize the introduction of new notation for each approach. It will remain clear that the 
syntax and semantics of possibility theory differs from those of probability theory.) Denote the collection 
of all subsets of S as £2 = 2 s . In the case that S has an infinite number of elements, £2 denotes a sigma- 
algebra (the definition of £2 given for probability in Appendix 7.A defines a sigma-algebra). Most of the 
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time this chapter restricts S to finite cardinality for reasons of simplicity. This is not a severe restriction, 
since in practical systems the representable body of knowledge will be finite. 

There are two distinguished subsets of £2 , the empty set, </>, and the set S itself. Let C denote a confidence 
function that maps the elements of £2 into the interval [0, 1], C:£ 2 — > [0, 1]. It is required that C((p) = 
0 and that C(S) — 1. <p can be called the “impossible” or “never true” event, and S can be called the “sure” 
or “always true” event. Note that C(A ) = 0 does not imply A = (p and C(A) = 1 does not imply A = S, 
where A g £2 . 

In order to have a minimum of coherence, any confidence function should be monotonic with respect 
to inclusion, which requires that A <z B implies C{A) < C(B). This is interpreted to mean that if a first 
event is a restriction of (or implies) a second event, then there is at least as much confidence in the 
occurrence of the second as in the occurrence of the first. Immediate consequences of this monotonicity 
are that C(AuB) > max[C(A), C(5)] and C(AnB) < min[C(A), C(B)]. 

The limiting case, C(AuB) = max[C(A), C(B)], can be taken as an axiom that defines a possibility 
measure. 22 (Zadeh was the first to use the term possibility measures to describe confidence measures that 
obey this axiom. He denoted them Il(-), the convention that is followed in this chapter.) The term 
“possibility” for this limiting case can be motivated, even justified, by the following observations (this 
motivation follows a similar treatment in Dubois and Prade.) 5 

Suppose Eg £ 2 is such that C(E) = 1. Define a particular possibility measure as n t (A) = 1 if An E? 

(p and 0 otherwise. Then interpret Il^A) = 1 to mean A is possible. Also, since n t (A u not-A) = 
nj(S) = 1, maxjll^A), n^not-A)] = 1. Interpret this to mean that of two contradictory events, at 
least one is possible. However, one being possible does not prevent the other from being possible, too. 
This is consistent with the semantics of judged possibilities, which invokes little commitment. Finally, 
Hj(A u B) = maxjn^Aj^jlB)] seems consistent with notions of physical possibility: to realize AuB 
requires only the easiest (i.e., the most possible) of the two to be realized. 

Because “max” is a reflexive, associative, and transitive operator, any possibility measure can be 
represented in terms of the (atomic) elements of S: n : (A) = sup{7tj(a)|£!G A}, where “sup” stands for 
supremum (that is, for least upper bound), Ae£2, as S, and Ti^a) = n^ja}). Call K^a) a possibility 
distribution (defined on S). Consider a possibility distribution to be normalized if there exists at least one 
asS such that Ti^a) = 1. If S is infinite, a possibility distribution exists only if the axiom is extended to 
include infinite unions of events. 23 

Now take the limiting case C(A n B) = min [C(A), C(B)] as a second axiom of possibility theory, and 
call set functions that satisfy this axiom necessity measures. The term “necessity” for this limiting case 
can be motivated, even justified, by the following observations. 

Suppose Eg £2 is such that C (£) = 1. Define a particular necessity measure as Nj(A) = 1 if £cA, and 
0 otherwise. N 1 (A)= 1 clearly means that A is necessarily true. This is easy to verify from the definitions: 
if n t (A) = 1 then Nj (not-A)] = 0, and ifn^A) = 0 thenNj(not-A)] = 1. Thus, n^A) = 1 -N^not-A)]. 
This is interpreted to mean that if an event is necessary, its contrary is impossible, or, conversely, if an 
event is possible its contrary is absolutely not necessary. This last equation expresses a duality between 
the possible and the necessary, at least for the particular possibility and necessity functions used here. 

Because “min” is a reflexive, associative, transitive operator, this duality implies it is always appropriate 
to construct a necessity measure from a possibility distribution: 

N 1 (Aj = inf|l-7t 1 ^a|flg aJ (7.6) 

where “inf” stands for infemum (or greatest lower bound). 

Several additional possibility and necessity relationships can be quickly derived from the definitions. 
For example: 
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1. minjN^A), Njjnot-A)] = 0 (if an event is necessary its complement is not the least bit necessary). 

2. IljtA) > N^A) for all AeQ. (an event becomes possible before it becomes necessary). 

3. ITjfA) + Il^not-A) > 1. 

4. NfA) + Nfnot-A) < 1. 

Thus, the relationship between the possibility (or the necessity) of an event and the possibility (or 
necessity) of its contrary is weaker than in probability theory, and both possibility and necessity numbers 
are needed to characterize the uncertainty of an event. However, both probability and possibility can be 
characterized in terms of a distribution function defined on the atomic members of the reference set. 

Now adopt this motivation and justification to call arbitrary functions, n(A) and N(A), possibility 
and necessity functions, respectively, if they satisfy the two axioms given above and can be constructed 
from a distribution, 7 z(a), as 11(A) = sup{jr(a)|ae A} and N(A) = inf{ 1 - 7 z(a) | a £ A} for all aeA.lt is 
straightforward to show that all the properties defined here for Il^A) and Nj(A) hold for these arbitrary 
possibility and necessity functions provided that 0 < 7 i{a) < 1 for all a e S and provided 7t(a) is normalized 
(i.e., there exists at least one aeS such that 7 i(a) = 1). The properties would have to be modified if the 
distribution function is not normalized. In the sequel it is assumed that possibility distribution functions 
are normalized. 

A relationship exists between possibility theory and fuzzy sets. To understand this relationship, some 
background on fuzzy sets is also needed. 

L. A. Zadeh introduced fuzzy sets in 1965. 24 Zadeh noted that there is no unambiguous way to 
determine whether or not a particular real number is much greater than one. Likewise, no unambiguous 
way exists of determining whether or not a particular person is in the set of tall people. Ambiguous sets 
like these arise naturally in our everyday life. The aim of fuzzy set theory is to deal with such situations 
wherein sharply defined criteria for set membership are absent. 

Perhaps the most fundamental aspect of fuzzy sets that differentiates them from ordinary sets is the 
domain on which they are defined. A fuzzy set is a function defined on some (ordinary) set of interest, 
S, termed the domain of discourse. As discussed earlier in this chapter, probability is defined on a collection 
of ordinary sets, 2 s . This is a profound difference. Measure theory and other topics within the broad area 
of real analysis employ collections of subsets of some given set (such as the natural numbers or the real 
line) in order to avoid logical problems that can otherwise arise. 25 

Another difference between fuzzy sets and probability theory is that fuzzy sets leave vague the meaning 
of membership functions and the operations on membership functions beyond a generalization of the 
characteristic functions of ordinary sets (note that the terms fuzzy set and fuzzy membership function 
refer to the same thing). To understand this, let {%} be the domain from which the elements of an ordinary 
set are drawn. The characteristic function of the ordinary “crisp” set is defined to have value 1 if and 
only if x is a member of the set, and to have the value 0 otherwise. A fuzzy set is defined to have a 
membership function that satisfies 0 <f(x ) < 1. In this sense, the characteristic function of the ordinary 
set is included as a special case. However, the interpretation of the fuzzy membership function is 
subjective, rather than precise; some researchers have asserted that it does not correspond to a probability 
interpretation 26 (although that assertion is subject to debate). This suggests that fuzzy membership 
functions will prove useful in possibility theory as possibility distribution functions, but not directly as 
possibility measures. 

Operations on fuzzy sets are similarly motivated by properties of characteristic functions. Table 7. 1 
summarizes the definitions of fuzzy sets, including those that result from operations on one or more 
other fuzzy sets. There, f(x) denotes a general fuzzy set, and f A {x) denotes a particular fuzzy set, A. “Max” 
and “min” played a role in the initial definition of fuzzy sets. Thus, fuzzy intersection suggests a possibility 
measure, fuzzy intersection suggests a necessity measure, and if a fuzzy set is thought of as a possibility 
distribution, the connection that/(x) can equal n({x}) for xe S is established. 

Assigning numerical values as the range of a membership function is no longer essential. One gener- 
alization of Zadeh’s original definition that now falls within possibility theory is the accommodation of 
word labels in the range for a fuzzy membership function. This naturally extends fuzzy sets to include 
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TABLE 7. 1 Summary Definition of Fuzzy Membership Functions 



Operation 


Definition 


Empty Set 


/ is empty iff f(x ) = 0 V x 


Complement 


f c = 1 - f(x) V x 


Equality 


fA=/ii ff /AW =/sM 


Inclusion 


f A Q f B iff f A (x) <f B (x) V x 


Disjunction 


f A ubM = max \f A (x),f B (x)] 


Conjunction 


finaix) = min I/aW./jWI 


Convexity 


A is a convex fuzzy set ^ f A [kxi + (1 - k)x 2 ] > 
min f° r a U x i an d * 2 i n -X, and 

for any constant, k, in the interval [0,1]. 


Algebraic Product 


I/aM /b(*)| 


Algebraic Sum 


1 fi(x) + fs(x)\ < 1 


Absolute Difference 


1 L(x) -/sWf 


Entropy Ratio 


J rfxmin[/ A (x),l-/ A (x)] 

j dymax[f A (y),l- f A (y)] 



language, creating an efficient interface with rule-based expert systems. Architectures created using this 
approach are often referred to as fuzzy controllers. 27 Except for this difference in range, the fuzzy sets in 
a fuzzy controller continue to be combined as indicated in Table 7.1. 

7.2.3 The Belief Theory Approach 

Dempster 7 and Shafer 8 start with an exhaustive set of mutually exclusive outcomes of some experiment 
of interest, S, and call it the frame of discernment. (In much of the literature on Dempster- Shafer theory, 
the frame of discernment is denoted 0. This discussion uses S to minimize the introduction of new 
notation for each approach. It will remain clear that the syntax and semantics of belief theory differ from 
those of probability theory.) Dempster- Shafer then form Q. = 2 s , and assign a belief, 5(A), to any set 
A <z £2. (In some literature on belief theory, the set formed is 2 s - cp, but this can cause confusion and 
makes no difference, as the axioms will show.) The elements of S can be called atomic events; the elements 
of Q. can be called molecular if they are not atomic. The interpretation of a molecular event is that any 
one of its atomic elements is “in it,” but not in a constructive sense. The evidence assigned to a molecular 
event cannot be titrated; it applies to the molecular event as a whole. The mass of evidence is also 
sometimes called the basic probability assignment; it satisfies the following axioms: 

1. m ((f)) = 0 

2. m (A) > 0 for all A e Q. 

3. £m(A) = 1 

Note that although these axioms bear some similarity to the axioms for probability, they are not the 
same. Belief and probability are not identical. The crucial difference is that axiom 3 equates unity with 
the total accumulated evidence assigned to all elements of Q . , whereas an axiom of probability equates 
unity with S e Q. . Belief theorists interpret S to mean a state of maximal ignorance, and the evidence 
for S is transferable to other elements of Q. as knowledge becomes manifest, that is, as ignorance 
diminishes. Hence, in the absence of any evidence, in a state of total ignorance, assign m(S) =1 and to 
all other elements of Q. assign a mass of 0. In time, as knowledge increases, some other elements of Q. 
will have assigned nonzero masses of evidence. Then, if m(A) > 0 for some A e Q. , m(S) < 1 in accord 
with the reduction of ignorance. This ability of belief theory to explicitly deal with ignorance is often 
cited as a useful property of the approach. However, this property is not unique to belief theory. 28 
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Belief theory further defines a belief function in terms of the mass of evidence. The mass of evidence 
assigned to a particular set is committed exactly to the set, and not to any of the constituent elements 
of the set. Therefore, to obtain a measure of total belief committed to the set, add the masses of evidence 
associated with all the sets that are subsets of the given set. For all sets A and B in £2, define 

= (7.7) 

BcA 



Given Bel( A), m{B) can be recovered as follows: 

m (B) = ^{-\)^ A Bd( A ) (7.8) 

AcJ 

Thus, given either representation, the other can be recovered. This transform pair is known as the 
Mobius transformation. 29 

Be Q is a focal element of the belief system if m(B) > 0. (Confusingly, some authors seem to equate 
the focal elements of a belief system with the atomic events. That definition would not be sufficient to 
obtain the results cited here.) The union of all the focal elements of a belief system is called the core of 
the belief system, denoted C. It should be apparent that Bel(A) = 1 if and only if C cz A. It should also 
be apparent that if all the focal elements are atomic events, then Bel(A) is the classical probability measure 
defined on S. It is this last property that leads some authors to assert that belief theory (or Dempster- 
Shafer theory) is a generalization of probability theory. However, a generalization should also be expected 
to do something the other cannot do, and this has not been demonstrated. Indeed, Dempster explicitly 
acknowledges that there are stronger constraints on belief theory than on probability theory. 5 ' 23,30 Demp- 
ster was well aware that his rule of combination (still to be discussed) leads to more constrained results 
than probability theory, but he preferred it because it allows an artificial intelligence system to get started 
with zero initial information about priors. 

This belief function has been called the credibility function, denoted Cr(A), and also the support for 
A, denoted Su(A). In the sequel, Su(A) will be used in keeping with the majority of the engineering 
literature. By duality, a plausibility function, denoted Pl(A), can be defined in terms of the support 
function: 



Pl^A j = 1-Sw^not-Aj 



= 1 



£ m [ B ) 

Bc(a-a) 



=i - L m ( B ) = H m ( B ) 

AnB=( |) 



(7.9) 



Thus, the plausibility of A is 1 minus the sum of the mass of evidence assigned to all the subsets of £2 
that have an empty intersection with A. Equivalently, it is the sum of the mass of evidence assigned to 
all the subsets of £2 that have a nonempty intersection with A. An example should help to solidify these 
definitions. Suppose S = {x, y, zj. Then £2 = {</>, {x}, { y}, {z}, {x,y}, { x,z }, {y,z}, S}. The credibility and the 
plausibility of all the elements of £2 can be computed by assigning a mass of evidence to some of the 
elements of £2 as shown in Table 7.2. 

For any set A £ £2, Su(A) < Pl{A), Su(A) + Su(not-A) < 1, and Pl(A) + P/(not-A) > 1. 

The relationship between support and plausibility leads to the definition of an interval, [Su(A),Pl(A)]. 
What is the significance of this interval? The support of a proposition can be interpreted as the total 
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TABLE 7.2 An Example Clarifying Belief System Definitions 



Event 

A 


Mass of Evidence 
m(A) 


Support 
^ '^m{B ) 

BqA 


Plausibility 
i- ^ m ( B ) 

AnB=<]> 


<p 


0 


0 


0 


M 


m x 


m x 


S' 

1 

s 

1 

s' 

1 


M 


m y 


niy 


1 — m x — m z — m xz 


fz} 


m z 


m z 


1 — m x — m — 


{xy} 


tn 

" l xy 


m x + m y + m xy 


1 — m z 


{ x,z} 


m xz 


m x + m z + m xz 


1 - m y 


{ y.z} 


in 

"yz 


m y + m z + m yz 


1 - m x 


s 


1 - E (all other masses) 


1 


1 



mass of evidence that has been transferred to the proposition, whereas the plausibility of the proposition 
can be interpreted as the total mass of evidence that has either already been transferred to the proposition 
or is still free to transfer to it. Thus, the interval spans a spectrum of belief from that which is already 
available to that which may yet become available given the information at hand. 

7.2.4 Methods of Combining Evidence 

Each of the three theories just reviewed has its own method of combining evidence. This section provides 
an example problem as a basis of comparison (this example follows Blackman 10 ). Suppose there are four 
possible targets operating in some area, which are called t v t 2 , f 3 , and f 4 . Suppose q is a friendly interceptor 
(fighter aircraft), f 2 is a friendly bomber, f 3 is a hostile interceptor, and f 4 is a hostile bomber. 

7.2.4. 1 Getting Started 

This is enough information to begin to define a probability space. Define S = {fi, f 2 , f 3 , fj and form 
Q. = 2 s . Clearly, (p e Q. and P [cp] = 0. Also, S e Q. and P[S] = 1 (i.e., one of the targets will be observed 
because S is exhaustive). 

This provides enough information for a possibility system to establish its universe of discourse, S = 
ft,, t 2 , f 3 , fj. However, there is no clearly defined way to characterize the initial ignorance of which target 
may be encountered. Note that there is not a constraint of the form/ 0 (S) = 1. A possible choice is f 0 (x) = 
1 if x g S, and 0 otherwise, corresponding to an assignment of membership equal to nonmembership 
for each of the possible targets about which no information is initially available. Another possible choice 
is f 0 (x ) = 0, corresponding to an assignment of the empty set to characterize that no target is present 
prior to the receipt of evidence. As noted above, {Low, Medium, High} could also be chosen as the range, 
and f 0 (x) = Low could be assigned for all x in S. In order to be concrete, choose f 0 (x) = 0. 

This is also enough information to begin to construct a belief system. Accepting this knowledge at 
face value and storing it as a single information string, {tjUt 2 Uf 3 u f 4 }, with unity belief (which implies 
P(£2) = 1, as required), minimizes the required storage and computational resources of the system. 

7. 2.4. 2 Receipt of First Report 

Suppose a first report comes in from a knowledge source that states, “I am 60 percent certain the target 
is an interceptor.” All three systems map the attribute “interceptor” to the set {t v t 3 }. 

7.2.4.2.1 Probability Response 

Based on the first report, P[{fpf 3 }|l st report}] = 0.6. A probability approach requires that P[not{f 1 ,f 3 }|l st 
report}] = 1 - P[{f 1 ,t 3 }|l s * report}]. The set complement is with respect to S, so P[{f 2 ,f 4 }|l st report}] = 
0.4. The status of knowledge at this point is summarized on the lattice structure based on subsets of S, 
as shown in Figure 7.5. 
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FIGURE 7.5 The lattice imposes exploitable constraints. 

With pi = P[{f;}] the constraints from this lattice structure lead to the conclusion that: 



0 < P 1 < 0.6 


(7.10) 


0 <p 2 < 0.4 


(7.11) 


p 3 = 0.6-p I (0<p 3 <0.6) 


(7.12) 


p 4 =0.4-p 2 (0<p 4 <0.4) 


(7.13) 



7.2A.2.2 Possibility Response 

The possibility system interprets the message as saying that n({f 1 }u{f 3 }) = 0.6. = max[n({t 1 }, Il({f 3 }]. 
This implies only that 0 < n ( { f x } < 0.6, and 0 < n ( { t 3 } < 0.6, and does not express any constraining 
relationship between Ilffj} and II {t 3 }. Because {S} = {f 1 ,f 3 }u{t 2 ,f 4 } andIl({S}) = l,max[n{fj,t 3 }, II{f 2 ,f 4 }] = 
max[0.6, Il{f 2 ,f 4 }] = 1, which implies that Il{f 2 ,f 4 }] = 1. From this, the conclusion is reached that 0 < 
TI{f 2 } < 1, and 0 < n{f 4 } < 1, again without any constraint between n{t 2 } and n{f 4 }. This contributes little. 

7.2A.2.3 Belief Response 

Since the reported information is not certain regarding whether or not the target is an interceptor, the 
belief system transfers some of the mass of evidence from S as follows: mfS) = 0.4 and m l ({t v f 3 }) = 0.6 
(all the others are assigned zero). From these the support and the plausibility for these two propositions 
can be computed, as shown in Table 7.3. This is all that can be inferred; no other conclusions can be 
drawn at this point. 

7.2.4.3 Receipt of Second Report 

Next, a second report comes in from a knowledge source that states, “I am 70 percent sure the target is 
hostile.” All three systems map the attribute “hostile” to the set {f 3 ,f 4 }. 



TABLE 7.3 


Belief Support and Plausibility 


Event 


Support 


Plausibility 


Up ^3} 


0.6 


1.0 


s 


1.0 


1.0 
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7. 2.4.3. 1 Probability Combination 

At this point, P[{f 3 ,f 4 }|2 nd report}] = 0.7 and P[{fj,f 2 }|2 nd report] = 0.3. Thus, the following probabilities 
for six possible pairings of four targets result from the second report: 





] = 0.3 


(7.14) 


p[M 


] = 0.6 


(7.15) 


p [M] 


= 1 — X 


(7.16) 




1H 


(7.17) 


p[M 


| = 0.4 


(7.18) 


p[M 


] = 0.7 


(7.19) 



Now, using Equations 7.13, 7.14, and 7.16 (because this choice involves only three unknowns; any such 
choice necessarily provides the same answers), together with a characteristic matrix, gives 



1 1 0 


P 


W] 




’0.3’ 


1 0 1 


P 


I'd] 


= 


0.6 


0 1 1 


P 


I'd] 




X 



and any standard technique for matrix inversion can be used to obtain 





1 1 -1 


"0.3" 




P 


wn 


0.5 


1 -1 1 


0.6 


= 


P 


I'd] 




-1 1 1 


X 




P 


I'd] 



This leads to the conclusions that 



(7.20) 



(7.21) 



P[{t, }] = 0.45 -(x/2) 


(7.22) 


p[{r 2 }] = (x/2)-0.l5 


(7.23) 


P[{t 3 }] = 0.15 + (x/2) 


(7.24) 


P[{t 4 }] = 0.55 -(x/2) 


(7.25) 
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FIGURE 7.6 Even partial information supports a preliminary decision (“•” marks the mid-range of x). 
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The fact that these are all probabilities constrains x to the range 0.3 < x < 0.9, which, in turn, implies 



0.0 < < 0.3 


(7.26) 


0.0<p[{t,}]<0.3 


(7.27) 


0.3 < P[{f 3 }] < 0.6 


(7.28) 


0.1<F>[{f 4 }]<0.4 


(7.29) 



This is all that can be deduced until at least one additional piece of independent information arrives. 
However, even this partial information supports a preliminary decision, as shown in Figure 7.6. 

7.2A.3.2 Possibility Combination 

Possibility theory combines information using min and max beginning with the following observed facts: 
(1) n[{f 1; f 3 }] = 0.6, and (2) n[{f 3 ,f 4 }] = 0.7. Since {f 3 } = {t v t 3 } n {f 3 ,f 4 }, N({t 3 }) = min[0.6, 0.7] = 0.6. 
However, since there are no defined constraints among the individual possibilities aside from the min- 
max rules of combination, little else can be deduced. Therefore, possibility theory does not seem well 
suited to a sequential combination of information even though it may be an effective way to assign 
measures of belief in other settings (e.g., in fuzzy controllers 27 ). For this reason, there will be no further 
consideration of possibility theory in this chapter. 

7.2A.3.3 Belief Combination 

The belief system represents the information in the second report as an assignment of masses of evidence 
from a second source: m 2 (S) = 0.3 and m 2 ({t 3 ,t 4 }) = 0.7. Dempster’s rule is then used to combine the 
masses of evidence from the two independent sources. In this case the calculations are particularly simple, 
as shown in Table 7.4. Again, the support and the plausibility of each of these events can be computed, 
as shown in Table 7.5. 

Obviously Dempster’s rule does not directly inform belief in the presence of individual targets, yet the 
physical situation often presents only one target. This and related phenomena have led some authors to 
point out that belief theory is weak in its ability to result in a decision. 28 
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TABLE 7.4 Application of Dempster’s Rule 





m 2 ({ = 0.7 


m 2 (S) = 0.3 


t}}) = 0.6 
wii(S) = 0.4 


m{t,}) = 0.42 
m({t 3 ,f 4 }) = 0.28 


m({t v t 3 }) = 0.18 
m(S) = 0.12 



TABLE 7.5 Combined Support and Plausibility 



Event 


Support 


Plausibility 


(hi 


0.42 


1.0 




0.60 


1.0 


{^>£ 4 } 


0.70 


1.0 


S 


1.0 


1.0 



7. 2.4.4 Inconsistent Evidence 

Inconsistency is said to occur when one knowledge source assigns a mass of evidence to one event (set), 
a second knowledge source assigns a mass of evidence to a different event (set), and the two events have 
nothing in common — the intersection of their set representations is null. This situation has not so far 
arisen in the target identification example considered above and will now be introduced. 

Suppose a third report comes in that states “I am 80 percent sure one of the known targets is present: 
{f 2 } is twice as likely as {tj}; {f 3 } is three times as likely as {tj}; and {f 4 } is twice as likely as {fj.” 

7. 2.4.4. 1 Probability Resolution 

This third report calls into question whether or not the probability space so far constructed to model 
the situation is accurate. Do the four targets represent an exhaustive set of outcomes, or don’t they? One 
possibility is that other target types are possible; another possibility is that there really is no target present. 
So the probability space must be modified to consider the possibility that something else can happen 
and guarantee that the atomic events really exhaustively span all possible outcomes. Therefore, define an 
additional atomic event, O, called “other” to denote the set of whatever other undifferentiated possibilities 
there might be. The probability system then represents the third report as stating that P[{fj}] = 0.1; 
P[{f 2 }J = 0.2; P[{t 3 }] = 0.3; P[{r 4 }] = 0.2; and P[0] = 0.2. Now in order to combine this report with the 
earlier reports, the earlier probability results must be mapped to the new probability space just defined, 
otherwise they are simply incommensurate. 

For example, based on the first report, -P[{f 1 ,f 3 }|l st report}] = 0.6. A probability approach requires that 
f , [not{f 1 ,f 3 }|l s * report}] = 1 — i 3 }}^,^}! 1 st report}]. The set complement is with respect to S, so if S includes 
O, i 3 } {t 2 ,t 4 ,0} 1 1 st report}] = 0.4. Similarly, when P[{f 3 ,f 4 }|2 nd report}] = 0.7 and S includes O, P[{f 1 ,f 2 ,0}|2 nd 
report] =0.3. This requires a complete new analysis that obviates the analysis reported in Sections 7. 2.4. 2. 1 
and 7. 2.4. 3.1, above, and there is not sufficient space in this chapter to do it again. Suffice it to say that 
the results from the two messages agree qualitatively with the third message, but there are quantitative 
disparities. Utility theory has been developed to address such situations. A rational person is expected 
to choose an alternative that has the greatest utility. So, how can utility be assigned to these two 
alternatives? One way is to equate utility with the number of corroborating reports; this is appropriate 
if all data sources have equal veracity. Since two reports are consistent for the first alternative, and only 
one report is self-consistent for the second alternative, the data fusion system would prefer the first 
alternative if this utility function is adopted. Utility theory also offers the means to create a linear 
combination of the alternatives, and the number of corroborating reports can again be used to form the 
weights. The computations are omitted. 

7.2.4.4.2 Belief Resolution 

The belief system represents the third report as stating that m[{fj}] = 0.1; m[{t 2 }] = 0.2; w[{f 3 }] = 0.3; 
m[{f 4 }] = 0.2; and m[S] = 0.2. The assignment of a mass of evidence to S accounts for the uncertainty 
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TABLE 7.6 The Dempster- Shafer Rule Applied to Inconsistent Evidence 





f 3 (] = 0.42 


fill = 0.18 


m l?2 [ {f 3 .f 4 }] 


= 0.28 


m w [S] = 0.12 


(N 
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00 
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m[{t 3 }] = 0.084 


m[{t v t 3 }] = 0.036 


m[{t 3 ,t 4 }] = 


= 0.056 


m[S] = 0.024 


m 3 [{t il] = 0.1 


k = 0.042 


m[{ fj] = 0.018 


k = 0.028 




m[{fj}] = 0.012 


m 3 l{t 2 }] = 0.2 


k = 0.084 


k = 0.036 


k = 0.056 




m[{t,ll = 0.024 


«%[{f 3 }] = 0.3 


m[{t 3 }] = 0.126 


m[{f 3 }] = 0.054 


m[{t 3 }] = 0.084 


m[{f 3 }] = 0.036 


m 3 [{t 4 }] = 0.2 


k = 0.084 


k = 0.036 


m,[{t 4 }] = 


0.056 


m 3 [{f 4 }] = 0.024 
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u> 
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FIGURE 7.7 Support-plausibility intervals result from combining three bodies of evidence. 



as to whether or not one of the known targets is actually present. The Dempster- Shafer rule of combi- 
nation applies as before, but with one modification. When the evidence is inconsistent, their products 
of masses of evidence are assigned to a measure of inconsistency, termed k. The results from this first 
part of the procedure are shown in Table 7.6. 

The next step is to sum all the corresponding elements of the matrix. Thus, for example, the total 
mass of evidence assigned to inconsistency, k, is 0.042 + 0.028 + 0.084 + 0.036 + 0.056 + 0.084 + 0.036 = 
0.366. Finally, divide the summed masses of evidence by the normalizing factor (1 - k), which has the 
value 0.634 in this example. The results for individual targets follow: m[{t J] = (0.018 + 0.012)/0.634 = 
0.047; m[{t 2 }] = 0.022/0.634 = 0.038; m[{t,}] = (0.084 + 0.084 + 0.126 + 0.054 +0.036)/0.634 = 0.606; 
and m[{f 4 }] = (0.056 + 0.024)/0.634 = 0.126. The resulting Support-Plausibility intervals are diagrammed 
in Figure 7.7. 



7.3 An Example Data Fusion System 

The characterization of components needed in a data fusion system begins with standard techniques, 
such as structured, object-oriented, or component-based analysis. A complete analysis is beyond the 
scope of this chapter; however, the following example should help clarify and demonstrate the concepts 
discussed herein. The first step in any method of analyzing system requirements is to establish the system 
context. The system context is summarized in a context diagram that represents a jumping-off point for 
the abstract decomposition that follows. 
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SIGINT Fire Control 




FIGURE 7.8 The Level 0 diagram establishes the system boundary and clarifies what is considered to be inside or 
outside the system. 

7.3.1 System Context 

Suppose that an adversarial ground force armed with mobile ground-to-ground missiles has been 
deployed to harass a friendly force in a fixed location within a contested region. The friendly force is 
supported by an all-source intelligence center that provides target location data to a fire control system 
and to an air tasking order system. The fire control system directs artillery, while the air tasking order 
system automatically requests air support. The data fusion system resides within the all-source intelligence 
center, and is required to 

• Interface with other elements within the center that provide signals intelligence (SIGINT) mes- 
sages, measurements and analysis intelligence (MASINT) messages, and image intelligence 
(IMINT) reports; 

• Analyze the messages received from those elements to determine the presence and location of the 
mobile missile launchers; 

• Report those locations and any other available information about the status of the located launch- 
ers to a human analyst who will determine the optimal response to the threat posed by the 
launchers. 

The human controls the follow-on flow of location and status information to either the fire control 
system or the air tasking order system. 

The system context is summarized in the context diagram, which in structured analysis is known as 
the Level 0 diagram and is shown in Figure 7.8. Level 0 establishes the system boundary and clarifies 
what information is regarded as being internal or external to the system. 

7.3. 1.1 Intelligence Preparation of the Battlefield 

Suppose intelligence preparation of the battlefield (IPB) has estimated the following composition of 
mobile missile batteries operating in the contested region: 

• 12 batteries, each with 1 vehicle of type 1 (VI) 

10 with 3 vehicles of type 2 (V2) 

2 with 2 V2 

8 with 3 vehicles of type 3 (V3) 

3 with 2 V3 
1 with 1 V3 
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• 11 of the VI have SIGINT emitter type 1 (£1) and 6 of the VI have SIGINT emitter type 2 (£2); 
all 12 VI have at least one of these two types of emitters. When VI has both emitter types, only 
one emitter is on at a time, and it is used half the time. 

• 24 of the V2 have SIGINT emitter type 3 (£3) and 17 of them have £2; all 34 V2 have at least one 
of these two types of emitters. When V2 has both emitter types, only one emitter is on at a time, 
and it is used half the time. 

• 22 of the V3 have £1 and 19 of them have £3; all 31 V3 have at least one of these two types of 
emitters. When V3 has both emitter types, only one emitter is on at a time, and it is used half the 
time. 

• Image reports (IMINT) correctly identify vehicle type 98% of the time. 

• VI yield IR signature type 1 ( IRl ) 10 percent of the time; IR signature type 2 (IR2) 60 percent of 
the time; and no IR signature ( NoIR ) 30 percent of the time. 

• V2 yield IRl 80 percent of the time, IR2 10 percent of the time, and NoIR 10 percent of the time. 

• V3 yield IRl 10 percent of the time, IR2 40 percent of the time, and NoIR 50 percent of the time. 

• Batteries are composed of vehicles arrayed within a radius of 1 kilometer centered on VI. 

7.3. 1.2 Initial Estimates 

The example experiment involves receiving an intelligence report (i.e., a report that an emitter or an IR 
signature has been detected) and then determining the vehicle type. There are 77 vehicles. The IPB 
estimates given in Section 7.3. 1.1 indicate that there are nine configurations of vehicle/emitter, as shown 
in Table 7.7. Furthermore, there are nine configurations of vehicle/IR-signature, as listed in Table 7.8. 



TABLE 7.7 Nine Vehicle/Emitter Configurations 



Config. No. 


Vehicle/Emitter Configuration 


Quantity 


i 


VI with El 


6 


2* 


VI with £1 and £2 


5 


3 


VI with £2 


1 


4 


V2 with E2 


10 


5* 


V2 with £2 and £3 


7 


6 


V2 with £3 


17 


7 


V3 with £1 


12 


8* 


V3 with £1 and £3 


10 


9 


V3 with £3 


9 


Total 




77 



* Note: Each emitter is on half the time, one at a time. 



TABLE 7.8 Nine Vehicle/IR- Signature Configurations 



Config. No. 


Vehicle/IR Signature-Configuration 


Quantity 


i 


VI with IRl 


1.2 


2 


VI with IR2 


7.2 


3 


VI with NoIR 


3.6 


4 


V2 with IRl 


27.2 


5 


V2 with IR2 


3.4 


6 


V2 with NoIR 


3.4 


7 


V3 with IRl 


3.1 


8 


V3 with IR2 


12.4 


9 


V3 with NoIR 


15.5 


Total 




77 
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7.3. 1.2.1 Initial Probability Estimates 

These considerations (and others) lead to the following prior probabilities: 



p[pivn] = 8.5/12 


(7.30) 


p[ei V7] = 0 


(7.31) 


p[e1V3] = 17/31 


(7.32) 


p[p2 Vl] = 3.5/12 


(7.33) 


p[p2V2] = 13.5/34 


(7.34) 


p[p2 V3] = 0 


(7.35) 


p[p3 Vl] = 0 


(7.36) 


p[p3 V2] = 20.5/34 


(7.37) 


p[p3V3] = 14/31 


(7.38) 


p[/pivi] = o.i 


(7.39) 


p[/P1F2] = 0.8 


(7.40) 


p[jpiv3]=o.i 


(7.41) 


P I III VI =0.6 


(7.42) 


p[/P2V2l = 0.1 


(7.43) 


p[lR2V3] = 0.4 


(7.44) 


p[aMRVi] = 0.3 


(7.45) 


p[noIRV2\ = QA 


(7.46) 


P^NoIRVij = 0.5 


(7.47) 
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p)\/l /M/NP] = 0.98 


(7.48) 


P[V27M7NT] = 0.98 


(7.49) 


P^V 3 7M7NT ) = 0.98 


(7.50) 


p[7Pl] = 3 1.5/77 


(7.51) 


P[7P2] = 23/77 


(7.52) 


p[No7P] = 22.5/77 


(7.53) 


p[vi] = 12/77 


(7.54) 


p[v 2] = 34/77 


(7.55) 


P\V 3] = 3 1/77 


(7.56) 


P[fil] = 25.5/77 


(7.57) 


P[fi2] = 17/77 


(7.58) 


P[E3] = 34.5/77 


(7.59) 



From these, the initial values of the posterior probabilities can be computed using Bayes’ rule. In 
anticipation of an example that follows, examine P[V1|P1], P[V2\E1], and P[V3|P1] (other initial pos- 
terior probabilities can, of course, be computed in a similar manner): 

p[v l|El] = p[pl|y l] * p[v l]/ P[pl] = (8.5/12) * (l2/77) * (77/25.5) = 0.333 (7.60) 

p[vi|E2] = P[p2|yij * P[l^l]/p[£2] = (3.5/12) * (l2/77) * (77/17) = 0.206 (7.61 ) 

p[vi|p3] = p[p3|\a]*p[va]/p[£3] = 0 (7.62) 



7.3. 1.2.2 Initial Belief Estimates 

The representation of uncertainty within the Dempster- Shafer approach is an assignment of mass based 
either on observations reported by knowledge sources or on defined rules. Some rules typically come 
from an understanding of the problem domain, such as from IPB. To be concrete, consider the prior 
probabilities to also define a mass of evidence distribution, and from them compute the initial support 
and plausibility values for each event of interest (these computations are omitted to conserve space in 
this chapter). Note, however, that a belief system could express the conditional probability information 
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in the IPB in the form of rules. For example, P[V1\IMINT] = 0.98 could be expressed as the rule, “If 
IMINT reports VI, then VI occurs with mass of evidence 0.98.” Or, “If E2 is reported, then the mass of 
evidence for V3 is zero.” Additional examples are presented in Section 7 .3.2.2 below. 

7.3.2 Collections of Spaces 

This section considers how this example data fusion system can most efficiently be constructed. This will 
prove useful when examining the system in operation. This section summarizes some of the aspects of 
human decision making that motivate the use of collections of spaces. It also characterizes the modules 
that can implement the collection-of-spaces approach. 

7.3.2. 1 Motivation 

Pearl 31 provided a summary of human performance in decision-making tasks that contrasts the brute 
force applications of the Bayesian theory. This section is based on his ideas. 

The enumeration of all propositions of interest, and all combinations in which they can occur, is 
exponentially complex. This means that practical systems that attempt to define a joint probability 
function in a brute force way — by listing arguments in a table and trying to manipulate the table to 
compute marginal and conditional probabilities — are doomed to fail. In practice, many of the entries 
in such a matrix will be zero — most combinations of evidence never occur in nature. Pearl noted that 
humans seem to counter this complexity by only dealing with a small number of propositions at a time. 
Although humans make probabilistic judgments quickly and reliably when making pair-wise conditional 
statements (such as the likelihood of finding a target based on observing a certain feature), they estimate 
joint probabilities of many propositions poorly, hesitantly, and only with difficulty. Further, humans may 
be reluctant to estimate even pair-wise conditional statements in numerical terms, but they usually state 
with confidence whether or not two propositions are independent (that is, whether or not one statement 
influences the truth of the other). Even three-way dependency statements (e.g., measurement M implies 
target presence given condition C) are handled with confidence and consistency. 

This suggests that the fundamental building blocks of human knowledge are not exhaustive entries in 
a table to estimate joint probabilities. Instead, human knowledge builds on low-order marginal and 
conditional probabilities defined over small clusters of propositions. Notions of dependence within 
clusters and of independence between clusters seem basic to human reasoning. Our limited short-term 
memory and narrow focus of attention seem to imply that “... we reason over fairly local domains 
incrementally along parallel pathways whose structure implicitly codes information at the knowledge 
level itself.” 31 This apparent manner in which humans manage the complexity of decision making in real 
world settings can be captured in an approach that unites the concept of collections of probability spaces 
with Bayesian methods. 

7.3. 2. 2 Component-Based Implementation 

A software component is a unit of software with the following characteristics: (1) it is discrete and 
functionally well defined; (2) it has standardized, clear, and usable interfaces to its methods; and (3) it 
runs in a container, either with other components or as a stand-alone entity. 32-34 A component may 
contain object classes, methods, and data that can be reused in a manner similar to the reuse of hardware 
components of a system (although a component need not be object-oriented). The conceived collection- 
of-spaces component constitutes such a reusable component. 

A collection of related spaces imparts a common nature to components of a data fusion system — a 
system that may be distributed. Although the spaces exist at various levels of modeling abstraction and 
observation representation, the common nature provides the foundation for component definition and 
integration, as indicated in Figure 7.9. Each component comprises knowledge and evidence in a local 
domain. A local space and its associated observation processes model the domain. This means that the 
spaces in which the mutually exclusive causes lie are explicitly modeled, and the observation processes 
are analyzed in physical terms to explicitly characterize the evidence that can be measured. This results 
in C, a computation component with 
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FIGURE 7.9 Components in the system represent local domains, but they share a common nature. 

• Input — measured evidence in a common observation space (vector or scalar variables or events 
and/or logical assertions in a defined domain); 

• Internal data — prior information (e.g., prior marginal and conditional probabilities), if available, 
and data and knowledge about attributes of each cause; 

• Processing: 

• Calculate posterior values for occurrence of each cause (e.g., Bayes’ theorem and/or exploit 
lattice structure, Dempster- Shafer combining of evidence mass, fuzzy logic min-max, and/or 
rule-based calculation), 

• Calculate a figure of merit for each cause (e.g., probability interval, support-plausibility interval, 
or rule-based assertion), 

• Determine most likely causes from figures of merit, 

• Associate attributes with likely causes (e.g., from local database or from the properties of an 
observation process), 

• Determine routing of output data, 

• Accept and process feedback; 

• Output — likely or plausible causes with figures of merit (e.g., attributes of causes and routing 
information); 

• Feedback — from other components in the data fusion system (e.g., data and knowledge updates 
and updates affecting prior information). 

7.3. 2.3 Component Examples 

Consider as examples two components that share the task of determining the vehicle type based on 
message traffic that flows into the system. 

Both probability and belief systems are built on the foundation of an exhaustive set of possible 
outcomes, S, and go on to consider subsets of S. Therefore, the space defined for any component consists 
of S, £2 (the collection of subsets of S), and the assignment of either an initial probability or an initial 
mass of evidence distribution (and its associated support and plausibility) for as many possible outcomes 
as is feasible. Two examples follow. 

1. Sj = {VI, V2, V3} because these comprise the totality of observable outcomes with respect to 
vehicle type. There are 2 3 = 8 elements in £2 r The initial (prior) probabilities are computed from 
the data stored in the database: 



p[{vi}] = 12/77 



(7.63) 
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p[{v2}] = 34/77 



(7.64) 



p[{V3}] = 31/77 (7.65) 

These same values could be used to initialize a mass of evidence distribution, and the resulting 
support and plausibility could be calculated. 

2. S 2 = {£1, £2, £3} because these comprise the totality of observable outcomes with respect to emitter 
type. There are 2 3 = 8 elements in £2,. The initial (prior) probabilities are computed from the data 
stored in the database: 



P[{£l}] = 25.5/77 


(7.66) 


P[(E2}] = 17/77 


(7.67) 


p[{£3}] = 34.5/77 


(7.68) 



These same values could be used to initialize a mass of evidence distribution, and the resulting 
support and plausibility could be calculated. 

7.3.3 The System in Operation 

Now suppose a first message arrives from the SIGINT analyst that states £1 has been identified at location 
(x , , y, ) at time t 2 with confidence 0.9. Then, suppose a later message arrives from the MASINT analyst 
that states NoIR is detected at location (x v y j) at time t 2 and that t 2 is shortly after t v The MASINT 
analyst appends a note that states the IR detector has a miss rate of 0.05. 

7.3.3. 1 The Probability System Response 

The emitter probability component employs interval probabilities to update the elements of its probability 
space lattice: state £1 with probability 0.9; state £2 with probability x; state £3 with probability 0.1 - x, 
where 0 < x < 0.1; and the others based on the subset relationships. The vehicle component interprets 
this to mean a superposition of probability states because no true conditioning event yet exists. The 
vehicle component responds by weighting the three states together: 

(0.9) * p[v l|El] + (x) * p[v l|£2] + (0. 1 - x) * p[v l|£3] = (0.9) * (o.333) + (x) * (0.206) (7.69) 

This leads to a range of values that expresses an updated P[V1] at time tj and location (x 1; y,), which is 
denoted as P, , [ VI ]: 0.300 < P 1 VI] < 0.321. Similar ranges are computed for V2 and V3, respectively: 0.059 
< P U [V2] < 0.079 and 0.600 < P 1 a [V3] < 0.641. The vehicle component represents these as interval proba- 
bilities assigned to the elements of its probability space lattice, as shown in Figure 7.10. Note that EP[VJ 
must equal 1, and that simultaneous choices exist within these three intervals that satisfy this constraint. 

Even the first message results in a preliminary identification of vehicle type. Indeed, if the maximum 
possible values are assigned to £ U [V1] (0.321) and P U [V2] (0.079), then P U [V3] = 0.600. The ratio 
P, 1 [ V3]/P, , [ VI ] would equal 0.600/0.321 = 1.87, which can be tested against the threshold defined in 
Section 7.2.4, above: 



Decide V3 if 1.87 > 



(c w -qj 


,P| 


V!] 


(^1,3 - £ 3 , 3 ) 




H 



(7.70) 
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Combined Probability, ^ 

FIGURE 7.10 The first message results in a preliminary identification of vehicle type. 

Assuming that the costs of mistakes are equal, the threshold becomes P[V1]/P[V3] = 12/31 = 0.387. 
Clearly, 1.87 > 0.387, which leads to a decision with (assumed) acceptable risk that V3 has been detected 
at (x 1; y,). However, let us postpone this decision to examine the effect that the receipt of additional 
evidence has on the probability intervals. 

The IR component employs interval probabilities to update the elements of its probability space lattice: 
state NoIR with probability 0.95; state IRl with probability x; state IR2 with probability 0.05 - x, where 
0 < x < 0.05; and the others based on the subset relationships. The vehicle component could interpret 
this to mean a superposition of probability states because no true conditioning event yet exists. The 
vehicle component would then respond by weighting the three states together. As before, three ranges of 
values would express the updated probabilities of VI, V2, and V3: 



0.154 <P 1: 


M 


<0.168 


(7.71) 


0.146 <P 17 




<0.181 


(7.72) 


0.661 <P 12 


M 


<0.683 


(7.73) 



However, in this case, the vehicle component has already made an initial assessment of vehicle type 
at location (x v jq), and a determination must be made about how the probability intervals evolve in this 
setting where observations arrive in temporal sequence. Computing P[Vj|Ej, rR k ] from P[Vi,Ej,IR k ] and 
P[Ej,IR k ] using the intelligence preparation of the battlefield information presented in Section 7.3. 1.1, 
yields 



P 

P 



[u.,E.,JRj = p[£.|vJp[/RjvJp[V i ] 






i\y,E r iR k \ 



P{Vn E p 
P[ E p IR k ] 



(7.74) 

(7.75) 



(7.76) 



Next, each P[V)|£-,/R J must be weighted by the reported values of P[Ej] and P[IR k ], and then summed 
over j and k to obtain P C [V ; ], which denotes the combined probability interval for V). The required 
computations are shown in Tables 7.9, 7.10, and 7.11(a-c). 
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TABLE 7.9 Computation of 
PlV^EjJRt] from the IPB 



i 


j 


k 


P[Ej m P[LR t |VJ P[VJ 


i 


1 


1 


(8.5/12) (0.3) (12/77) 


i 


1 


2 


(8.5/12) (0.1) (12/77) 


i 


1 


3 


(8.5/12) (0.6) (12/77) 


i 


2 


1 


(3.5/12) (0.3) (12/77) 


i 


2 


2 


(3.5/12) (0.1) (12/77) 


i 


2 


3 


(3.5/12) (0.6) (12/77) 


2 


2 


1 


(13.5/34) (0.1) (34/77) 


2 


2 


2 


(13.5/34) (0.8) (34/77) 


2 


2 


3 


(13.5/34) (0.1) (34/77) 


2 


3 


1 


(20.5/34) (0.1) (34/77) 


2 


3 


2 


(20.5/34) (0.8) (34/77) 


2 


3 


3 


(20.5/34) (0.1) (34/77) 


3 


1 


1 


(17/31) (0.5) (31/77) 


3 


1 


2 


(17/31) (0.1) (31/77) 


3 


1 


3 


(17/31) (0.4) (31/77) 


3 


3 


1 


(14/31) (0.5) (31/77) 


3 


3 


2 


(14/31) (0.1) (31/77) 


3 


3 


3 


(14/31) (0.4) (31/77) 



TABLE 7.10 Computation of P[Ej,IR k ] from 
Information in Table 7.9 



j k PlEjJR,] 

I 1 (8.5/12) (0.3) (12/77) + (17/31) (0.5) (31/77) 

1 2 (8.5/12) (0.1) (12/77) + (17/31) (0.1) (31/77) 

1 3 (8.5/12) (0.6) (12/77) + (17/31) (0.4) (31/77) 

2 1 (3.5/12) (0.3) (12/77) + (13.5/34) (0.1) (34/77) 

2 2 (3.5/12) (0.1) (12/77) + (13.5/34) (0.8) (34/77) 

2 3 (3.5/12) (0.6) (12/77) + (13.5/34) (0.1) (34/77) 

3 1 (20.5/34) (0.1) (34/77) + (14/31) (0.5) (31/77) 

3 2 (20.5/34) (0.8) (34/77) + (14/31) (0.1) (31/77) 

3 3 (20.5/34) (0.1) (34/77) + (14/31) (0.4) (31/77) 



TABLE 7.11a Computation of P[V } \Ej,IR t ] 



j 


k 


P[P,IVRJ 


Reported RfE,] 


Reported P[fR t ] 


P[P„VRJ 


1 


1 


0.231 


0.9 


0.95 


0.178 


1 


2 


0.333 


0.9 


X 


[0, 0.015] 


1 


3 


0.429 


0.9 


0.05 - x 


[0, 0.019] 


2 


1 


0.438 


y 


0.95 


[0, 0.042] 


2 


2 


0.032 


y 


X 


[0, 0.000] 


2 


3 


0.609 


y 


0.05 - x 


[0, 0.005] 



Note: These values are then weighted by reported information and 
summed to yield P c [Vl] = [0.178, 0.259]. Note that 0 < x < 0.05 and 
0 <y < 0.10. 



These computations result in the updated probability intervals depicted in Figure 7.11: 



0. 178 <P c [vi]< 0.259 


(7.77) 


0.000 <p[V2]< 0.088 


(7.78) 
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TABLE 7.11b Computation of P[V 2 \Ej,IR k ] 



j 


k 


P[V 2 |E j -,/RJ 


Reported P[Ej] 


Reported P[/RJ 


P[V 2 >Ej,IR k \ 


2 


1 


0.563 


y 


0.95 


[0, 0.053] 


2 


2 


0.967 


y 


X 


[0, 0.005] 


2 


3 


0.391 


y 


0.05 - x 


[0, 0.002] 


3 


1 


0.227 


0.1 -y 


0.95 


[0, 0.022] 


3 


2 


0.921 


0.1 -y 


X 


[0, 0.005] 


3 


3 


0.268 


0.1 -y 


0.05 - x 


[0, 0.001] 




Note: 


These values are then weighted by reported information and summed 


to yield P C [V 2] = [0.000, 0.088]. Note that 0 < x < 0.05 and 0 < 


d 

d 

VI 


TABLE 7.11c Computation of P[V } \EpIR k ] 




j 


k 


PIV^IR,] 


Reported P[Ej] 


Reported PURJ 


P[V^jJR t ] 


1 


i 


0.939 


0.9 


0.95 


0.803 


1 


2 


0.095 


0.9 


X 


[0, 0.004] 


1 


3 


0.889 


0.9 


0.05 - x 


[0, 0.040] 


3 


1 


0.773 


y 


0.95 


[0, 0.073] 


3 


2 


0.079 


y 


X 


[0, 0.000] 


3 


3 


0.732 


y 


0.05 - x 


[0, 0.004] 



Note: These values are then weighted by reported information and 
summed to yield PJV3] = [0.803, 0.924]. Note that 0 < x < 0.05 and 0 < y < 
0 . 10 . 




FIGURE 7.11 The combined interval estimates result from the probability-evolution computations (compare with 
Figure 7.10). 



0.803 <P[V3]< 0.924 (7.79) 

Note again the simultaneous choices within these three intervals that satisfy the constraint DP[ V,] = 1. 
Assigning the minimum possible values to P C [V3] (0.803) and to P C [V2] (0) gives a maximum possible 
value for P C [V 1] of 0.197. The ratio P C [V3]IP C [V 1] then equals 0.803/0.197 = 4.08, which provides even 
greater confidence (i.e., reduced risk) than before in a decision that V3 is present at (x,, yj. At this point, 
the superposition of probability states could be “popped” and V3 could be declared to be present at 
location (x,, y,) as a true conditioning event. 

Also note that an engineering approximation is available to simplify the calculations. Product terms 
that include x and y contribute little to the summations when x and y are on the order of 0.1 or less. 
Ignoring such terms would enable analysts to arrive more quickly at the estimates P C [V1] ~ 0.178 and 
P C [V3] = 0.803, and then P C [V2] ~ 0.019 in order to satisfy the known constraint. Note that these 
approximations comprise one set of choices that simultaneously lie within the three probability intervals. 
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TABLE 7.12 Combination of Rule-Provided Information 
with Emitter Information 





m e (El) = 0.9 


m e (S) = 0.1 


m rl (VT) = 0.333 


m t (V 1) = (0.333) (0.9) = 


0.3 


m rI (V2) = 0 


m 1 (V2) = (0) (0.9) = 0 


— 


m rl ( V3) = 0.667 


«!,( V3) = (0.667) (0.9) = 


0.6 — 


m rt (S) = 0 


— 


m^S) = 0.1 


TABLE 7.13 Computation of Support-Plausibility Intervals 


Vehicle 


Support 


Plausibility 


VI 


0.3 


0.4 


V2 


0 


0.1 


V3 


0.6 


0.7 



In addition, other components can be defined. For example, a vehicle-tracking component could use 
standard probability methods to keep track of detected vehicles’ trajectories over time. Another compo- 
nent could identify batteries by using a distance metric to group detected vehicles into candidate batteries 
and track the batteries as entities over time. 

7.33.2 The Belief System Response 

After the first message arrives, the emitter belief component assigns a mass of evidence value to state £1 
of 0.9 and a mass of evidence value to state S of 0.1. The vehicle component uses a rule base that is 
constructed by essentially duplicating the calculation of initial posterior conditional probabilities from 
the information in the IPB. Three examples follow. 

Rule 1: If emitter £1 is reported, then m(V 1) = 0.333. 

Rule 2: If emitter £1 is reported, then m(V2) = 0. 

Rule 3: If emitter £1 is reported, then m(V 3) = 0.667. 

The vehicle component combines the information from the emitter component with the information 

in its rule base by multiplying the rule-provided masses of evidence by an appropriate mass of evidence 
from the emitter component, and by transferring any mass of evidence assigned by the emitter component 
to S as shown in Table 7.12. Note that the combined masses of evidence sum to one, as required; it is 
easy to show that this is always the outcome when transferring the ignorance this way. 

The vehicle component then computes the Support-Plausibility interval for each vehicle, as shown in 
Table 7.13. A comparison of these intervals with those produced by the probability system after receiving 
the first message (see Figure 7.10) shows that they are qualitatively the same. Unlike probability, however, 
belief theory is not equipped with a clear-cut decision-making rule. 

After the second message arrives, the IR belief component assigns a mass of evidence value to state 
NoIR of 0.95 and a mass of evidence value to state S of 0.05. The vehicle component also contains rules 
that relate MASINT reports to vehicles. Three examples follow. 

Rule 1: If NoIR is reported, then m(V 1) = 0.160. 

Rule 2: If NoIR is reported, then m(V2) = 0.151. 

Rule 3: If NoIR is reported, then m(V 3) = 0.689. 

The vehicle component combines the information from the second message with the information in 
its rule base by multiplying the rule-provided masses of evidence by an appropriate mass of evidence 
from the message, and by transferring the mass of evidence assigned by the IR component to S as shown 
in Table 7.14. 

The vehicle component now combines the masses of evidence from Tables 7.12 and 7.14 using the 
standard Dempster- Shafer combination rule as shown in Table 7.15. The inconsistency values from 
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TABLE 7.14 Combination of Rule-Provided Information 
with IR Information 



m ir (NoIR ) = 0.95 ra ir (S) = 0.05 


m, 2 (V 1) = 0.160 


m 2 = (0.160)(0.95) = 0.152 


m r2 ( V2) = 0.151 


m 2 = (0.151)(0.95) = 0.143 


m r2 (V3) = 0.689 


m 2 = (0.689)(0.95) = 0.655 — 


3 

'C/a 

II 

o 


— m 2 — 0.05 



TABLE 7.15 Combination of Masses of Evidence Derived from Messages 1 and 2 





mj(Vl) = 0.3 


m 1 (V2) = 0 


m I (V3) = 0.6 


mj(S) = 0.1 


m 2 (V 1) = 0.152 


m c {V 1) = 0.0456 


o 

II 


k = 0.0912 


m c (V 1) = 0.0152 


m 2 ( V2) = 0.143 


k = 0.0429 


m c {V2) = 0 


k = 0.0858 


m c (V 2) = 0.0143 


m 2 ( V3) = 0.655 


k = 0.1965 


o 

II 

-s«; 


m c (V 3) = 0.3930 


m c (V 3) = 0.0655 


m 2 (S) = 0.05 


m c (V 1) = 0.0150 


m c {V2) = 0 


m c (V. 3) = 0.0300 


m c (S) = 0.0050 



TABLE 7.16 Calculation of Normalized Combined 
Masses of Evidence 



m c (V 1) (0.0456 + 0.0152 + .0150)/0.5836 = 0.1299 

m c (V 2) (0.0143)/0.5836 = 0.0245 

m c ( V3) (0.3930 + 0.0300 + 0.0655)/0.5836 = 0.8370 

m c (S) (0.0050)/0.5836 = 0.0086 



TABLE 7.17 Computation of New Support-Plausibility Intervals 



Vehicle 


Support 


Plausibility 


VI 


0.1299 


0.1385 


V2 


0.0245 


0.0331 


V3 


0.8370 


0.8456 



Table 7.15 are summed to calculate the normalization factor (1 - k) = (1 - 0.4164) = 0.5836. Then the 
new combined masses of evidence are calculated as shown in Table 7.16. Finally, the Support-Plausibility 
interval for each vehicle is calculated as shown in Table 7.17. 

A comparison of these intervals with those produced by the probability system, after receiving the 
second message (see Figure 7.10), shows that they are again qualitatively the same. Even though belief 
theory is not equipped with a clear-cut decision-making rule, the situation that presents itself here clearly 
justifies a decision that vehicle 3 has been detected. 

7.3.4 Summary 

This chapter has introduced all of the key concepts and has provided a vehicle-identification example to 
show that components can characterize local domains. These components can communicate with each 
other to accumulate evidence up an abstraction hierarchy. New spaces can be formed in ways that relate 
to the earlier spaces but that account for the differences in the level of abstraction, as well as for the 
amount and kinds of evidence available. Partial information can be collected in local domains, and the 
domains can eventually result in a situation level. However, capturing all of the information in a single 
space is unnecessary and impractical. The changes in the spaces track the changes in the levels of our 
human understanding. 
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7.4 Contrasts and Conclusion 



Other authors have described related ideas. Peter Cheeseman has argued strongly for adoption of Bayesian 
techniques in favor of alternative methods of combining evidence, 35 and Judea Pearl 31 has developed 
techniques for implementing Bayesian methods in a distributed network environment. Here, the idea of 
a collection of spaces has been proposed as the idea that both clarifies the theoretical underpinnings of 
data fusion methods and makes their implementation practical. 

The highly modular approach that is described herein is well suited to a modern component-based 
software design pattern. Multiple processes, each matched to the computations of individual components, 
could lead naturally to real-time systems that solve real-world data fusion problems. 



Appendix 7.A The Axiomatic Definition of Probability 

Formally, a probability space is a three-tuple, ( S , £2, P), where 

• S is a set of observable outcomes from some experiment of interest (the totality of outcomes). 

• £2 is a collection of subsets of S with the following properties: 

1 . If A is an element of £2 then, the complement of A (with respect to S) is also an element of £2 . 

2. If both A and B are elements of £2, then the union of A and B is also an element of £2. 

3. If A; are elements of £2 for i = 1,2,.,., then any countable union of A ; is also an element of £2 . 

• P is a set-function defined on the elements of £2 (termed events) that has the following properties: 

1. To each event A in £2 there is assigned a nonnegative real number, P(A) (that is, 0 £2 P(A)). 

2. P(S) = 1. 

3. For A and B both in £2, if the intersection of A and B is empty, then P(A or B ) = P(A) + P(B). 

and, if the intersection of A, and A ] is empty when i t j, then P(U A ; ) = £ P(A ; ). 

The axioms presented here are in essentially the same form as proposed first (in 1933) by the Russian 
mathematician Andrei Nikolaevich Kolmogorov. 36 
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8.1 Introduction 



In tracking targets with less-than-unity probability of detection in the presence of false alarms (clutter), 
data association — deciding which of the received multiple measurements to use to update each track — 
is crucial. A number of algorithms have been developed to solve this problem. 14 Two simple solutions 
are the Strongest Neighbor Filter (SNF) and the Nearest Neighbor Filter (NNF). In the SNF, the signal 
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with the highest intensity among the validated measurements (in a gate) is used for track update and 
the others are discarded. In the NNF, the measurement closest to the predicted measurement is used. 
While these simple techniques work reasonably well with benign targets in sparse scenarios, they begin 
to fail as the false alarm rate increases or with low observable (low probability of target detection) 
maneuvering targets. 5,6 Instead of using only one measurement among the received ones and discarding 
the others, an alternative approach is to use all of the validated measurements with different weights 
(probabilities), known as Probabilistic Data Association (PDA). 3 The standard PDA and its numerous 
improved versions have been shown to be very effective in tracking a single target in clutter. 6,7 

Data association becomes more difficult with multiple targets where the tracks compete for measure- 
ments. Here, in addition to a track validating multiple measurements as in the single target case, a 
measurement itself can be validated by multiple tracks (i.e., contention occurs among tracks for mea- 
surements). Many algorithms exist to handle this contention. The Joint Probabilistic Data Association 
(JPDA) algorithm is used to track multiple targets by evaluating the measurement-to-track association 
probabilities and combining them to find the state estimate. 3 The Multiple-Hypothesis Tracking (MHT) 
is a more powerful (but much more complex) algorithm that handles the multitarget tracking problem 
by evaluating the likelihood that there is a target given a sequence of measurements. 4 In the tracking 
benchmark problem 8 designed to compare the performance of different algorithms for tracking highly 
maneuvering targets in the presence of electronic countermeasures, the PDA-based estimator, in con- 
junction with the Interacting Multiple Model (IMM) estimator, yielded one of the best solutions. Its 
performance was comparable to that of the MHT algorithm. 6,9 

This chapter presents an overview of the PDA technique and its application for different target-tracking 
scenarios. Section 8.2 summarizes the PDA technique. Section 8.3 describes the use of the PDA technique 
for tracking low observable targets with passive sonar measurements. This target motion analysis (TMA) 
is an application of the PDA technique, in conjunction with the maximum likelihood (ML) approach 
for target motion parameter estimation via a batch procedure. Section 8.4 presents the use of the PDA 
technique for tracking highly maneuvering targets and for radar resource management. It illustrates the 
application of the PDA technique for recursive state estimation using the IMMPDAF. Section 8.5 presents 
a state-of-the-art sliding-window (which can also expand and contract) parameter estimator using the 
PDA approach for tracking the state of a maneuvering target using measurements from an electro-optical 
sensor. This, while still a batch procedure, offers the flexibility of varying the batches depending on the 
estimation results. 

8.2 Probabilistic Data Association 



The PDA algorithm calculates in real-time the probability that each validated measurement is attributable 
to the target of interest. This probabilistic (Bayesian) information is used in a tracking filter, the PDA 
filter (PDAF), which accounts for the measurement origin uncertainty. 

8.2.1 Assumptions 

The following assumptions are made to obtain the recursive PDAF state estimator (tracker): 

• There is only one target of interest whose state evolves according to a dynamic equation driven 
by process noise. 

• The track has been initialized. 

• The past information about the target is summarized approximately by 






(8.1) 
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where N[x(k); x(k\k - 1)] denotes the normal probability density function (pdf) with argument 
x(k) , mean x(k\k - 1), and covariance matrix P(k\k - 1). This assumption of the PDAF is similar 
to the GPB1 (Generalized Pseudo-Bayesian) approach, 10 where a single “lumped” state estimate 
is a quasi-sufficient statistic. 

• At each time, a validation region as in Reference 3 is set up (see Equation 8.4). 

• Among the possibly several validated measurements, at most one of them can be target-originated — 
if the target was detected and the corresponding measurement fell into the validation region. 

• The remaining measurements are assumed to be false alarms or clutter and are modeled as 
independent identically distributed (iid) measurements with uniform spatial distribution. 

• The target detections occur independently over time with known probability PD. 

These assumptions enable a state estimation scheme to be obtained, which is almost as simple as the 
Kalman filter, but much more effective in clutter. 

8.2.2 The PDAF Approach 

The PDAF uses a decomposition of the estimation with respect to the origin of each element of the latest 
set of validated measurements, denoted as 



4)=WC < 82 > 

where z z (fc) is the i-th validated measurement and m(k) is the number of measurements in the validation 
region at time k. 

The cumulative set (sequence) of measurements* is 

2 ‘={ z (C, (8 ' 3) 



8.2.3 Measurement Validation 

From the Gaussian assumption (Equation 8.1), the validation region is the elliptical region 



T(k,Y) = jz: z — z|fc|fc — lj z — z|fc|fc-lj 



<y 



(8.4) 



where y is the gate threshold and 



s(k) = H(k)p(k\k-i}H(k) +R(k) (8.5) 

is the covariance of the innovation corresponding to the true measurement. The volume of the validation 
region (Equation 8.4) is 



v(*KW*f 




( 8 . 6 ) 



* When the running index is a time argument, a sequence exists; otherwise it is a set where the order is not 
relevant. The context should indicate which is the case. 
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where the coefficient c„ z depends on the dimension of the measurement (it is the volume of the n z - 
dimensional unit hypersphere: q = 2, q = K, c 3 = 471/3, etc.). 

8.2.4 The State Estimation 

In view of the assumptions listed, the association events 




is the target originated measurementj 
jnone of the measurements is target originated 



i = l,...m(k) 
i = 0 



(8.7) 



are mutually exclusive and exhaustive for m(k ) > 1. 

Using the total probability theorem 10 with regard to the above events, the conditional mean of the 
state at time k can be written as 



= (8.8) 

i = 0 

=Y^4 k \ k %( k ) 

i=0 

where x ; (fc|fc) is the updated state conditioned on the event that the f-th validated measurement is correct, 
and 



&(*)M 0 ,-(*K} ( 8 - 9 ) 

is the conditional probability of this event — the association probability, obtained from the PDA proce- 
dure presented in the next subsection. 

The estimate conditioned on measurement i being correct is 

x^k\kj = x(k\k-kj + i = l,...,m(/c) (8.10) 

where the corresponding innovation is 

v i (k) = z i (k)-z{k\k-l) 

The gain W(k) is the same as in the standard filter 

w(k) = p(k\k-l)H(k ) 

since, conditioned on 0 ; (fc), there is no measurement origin uncertainty. 

For i = 0 (i.e., if none of the measurements is correct) or m(k) = 0 (i.e., there is no validated 
measurement) 



( 8 . 11 ) 



( 8 . 12 ) 
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(8.13) 



xJkk) = x^kk — \) 

8.2.5 The State and Covariance Update 

Combining Equations 8.10 and 8.13 into Equation 8.8 yields the state update equation of the PDAF 

xffc|fcj = xf/c|fc-lj + w(fcjv(fcj (8-14) 



where the combined innovation is 



v 






i-1 



(8.15) 



The covariance associated with the updated state is 

P(k\k) = $ 0 {k)p(kk - 1) + [l - p 0 (i k)]p c (k\k) + p(k) (8.16) 

where the covariance of the state updated with the correct measurement is 3 



P c (k\k) = P(k\k - 1 ) - w(k)s(k)w(k) 



(8.17) 



and the spread of the innovations term (similar to the spread of the means term in a mixture 10 ) is 



P[k)=w[k ) 






i( k h( k H k ) - v i k H k ) 



W[kj 



(8.18) 



8.2.6 The Prediction Equations 

The prediction of the state and measurement to k + 1 is done as in the standard filter, i.e., 



x(k + lfc) = F{jcjx[k fcj 


(8.19) 


:(k + lfcj = H(k + ljx^fc + 1/cj 


(8.20) 



The covariance of the predicted state is, similarly, 



p{k + l\kj = F{k)p{k\kjF(k) +Q(k) (8.21) 

where P(k\k) is given by Equation 8.16. 

The innovation covariance (for the correct measurement) is, again, as in the standard filter 



S(k + 1) = H(k + 1 )p[k + 1| k)H[k + 1) + R[k + 1) 



( 8 . 22 ) 
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